Literature mining and database annotation of protein phosphorylation using a rule-based system

被引:60
作者
Hu, ZZ [1 ]
Narayanaswamy, M
Ravikumar, KE
Vijay-Shanker, K
Wu, CH
机构
[1] Georgetown Univ, Med Ctr, Dept Biochem & Mol Biol, Washington, DC 20057 USA
[2] Anna Univ, AU KBC Res Ctr, Madras 600044, Tamil Nadu, India
[3] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bti390
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation. Results: A rule-based system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was used to extract protein phosphorylation information from MEDLINE abstracts. An annotation-tagged literature corpus developed at PIR was used to evaluate the system for finding phosphorylation papers and extracting phosphorylation objects (kinases, substrates and sites) from abstracts. RLIMS-P achieved a precision and recall of 91.4 and 96.4% for paper retrieval, and of 97.9 and 88.0% for extraction of substrates and sites. Coupling the high recall for paper retrieval and high precision for information extraction, RLIMS-P facilitates literature mining and database annotation of protein phosphorylation.
引用
收藏
页码:2759 / 2765
页数:7
相关论文
共 27 条
[1]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[2]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[3]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[4]   The origins of protein phosphorylation [J].
Cohen, P .
NATURE CELL BIOLOGY, 2002, 4 (05) :E127-E130
[5]   Phospho.ELM:: A database of experimentally verified phosphorylation sites in eukaryotic proteins -: art. no. 79 [J].
Diella, F ;
Cameron, S ;
Gemünd, C ;
Linding, R ;
Via, A ;
Kuster, B ;
Sicheritz-Pontén, T ;
Blom, N ;
Gibson, TJ .
BMC BIOINFORMATICS, 2004, 5 (1)
[6]   PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine [J].
Donaldson, I ;
Martin, J ;
de Bruijn, B ;
Wolting, C ;
Lay, V ;
Tuekam, B ;
Zhang, SD ;
Baskin, B ;
Bader, GD ;
Michalickova, K ;
Pawson, T ;
Hogue, CWV .
BMC BIOINFORMATICS, 2003, 4 (1)
[7]  
FRIEDMAN C, 2001, BIOINFORMATICS S1, V17, P74
[8]   Accomplishments and challenges in literature data mining for biology [J].
Hirschman, L ;
Park, JC ;
Tsujii, J ;
Wong, L ;
Wu, CH .
BIOINFORMATICS, 2002, 18 (12) :1553-1561
[9]  
Hobbs JR, 1997, LANG SPEECH & COMMUN, P383
[10]   IProLINK: an integrated protein resource for literature mining [J].
Hu, ZZ ;
Mani, I ;
Hermoso, V ;
Liu, HF ;
Wu, CH .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (5-6) :409-416