Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

被引:28
作者
Daraselia, Nikolai [1 ]
Yuryev, Anton [1 ]
Egorov, Sergei [1 ]
Mazo, Ilya [1 ]
Ispolatov, Iaroslav [1 ]
机构
[1] Ariadne Genom Inc, Rockville, MD 20850 USA
来源
BMC BIOINFORMATICS | 2007年 / 8卷
关键词
FUNCTIONAL MODULES; BIOLOGICAL PROCESS; PREDICTION; CATEGORIZATION; IDENTIFICATION; SIMILARITY; RETRIEVAL; COMPLEXES; TOPOLOGY; SEQUENCE;
D O I
10.1186/1471-2105-8-243
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. Conclusion: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
引用
收藏
页数:17
相关论文
共 50 条
[31]   Regulation of CHK2 by DNA-dependent protein kinase [J].
Li, J ;
Stern, DF .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (12) :12041-12050
[32]   Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation [J].
Lord, PW ;
Stevens, RD ;
Brass, A ;
Goble, CA .
BIOINFORMATICS, 2003, 19 (10) :1275-1283
[33]   Specificity and stability in topology of protein networks [J].
Maslov, S ;
Sneppen, K .
SCIENCE, 2002, 296 (5569) :910-913
[34]   Inference of protein function from protein structure [J].
Pal, D ;
Eisenberg, D .
STRUCTURE, 2005, 13 (01) :121-130
[35]   Automated prediction of protein function and detection of functional sites from structure [J].
Pazos, F ;
Sternberg, MJE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (41) :14754-14759
[36]   Detection of functional modules from protein interaction networks [J].
Pereira-Leal, JB ;
Enright, AJ ;
Ouzounis, CA .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 54 (01) :49-57
[37]   Functional topology in a network of protein interactions [J].
Przulj, N ;
Wigle, DA ;
Jurisica, I .
BIOINFORMATICS, 2004, 20 (03) :340-348
[38]  
Ray S, 2005, BMC BIOINFORMATICS, V6, DOI 10.1186/1471-2105-6-S1-S18
[39]   Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature [J].
Raychaudhuri, S ;
Chang, JT ;
Sutphin, PD ;
Altman, RB .
GENOME RESEARCH, 2002, 12 (01) :203-214
[40]   Detecting fuzzy community structures in complex networks with a Potts model [J].
Reichardt, J ;
Bornholdt, S .
PHYSICAL REVIEW LETTERS, 2004, 93 (21)