Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

被引:708
作者
Gaudet, Pascale
Livstone, Michael S. [1 ]
Lewis, Suzanna E. [2 ]
Thomas, Paul D. [3 ]
机构
[1] Princeton Univ, Genome Databases Grp, Princeton, NJ 08544 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
[3] Univ So Calif, Div Bioinformat, Dept Prevent Med, Los Angeles, CA 90089 USA
关键词
gene ontology; genome annotation; reference genome; gene function prediction; phylogenetics; TREES;
D O I
10.1093/bib/bbr042
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.
引用
收藏
页码:449 / 462
页数:14
相关论文
共 16 条
[1]  
[Anonymous], 2004, Inferring phylogenies
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   The Gene Ontology in 2010: extensions and refinements The Gene Ontology Consortium [J].
Berardini, Tanya Z. ;
Li, Donghui ;
Huala, Eva ;
Bridges, Susan ;
Burgess, Shane ;
McCarthy, Fiona ;
Carbon, Seth ;
Lewis, Suzanna E. ;
Mungall, Christopher J. ;
Abdulla, Amina ;
Wood, Valerie ;
Feltrin, Erika ;
Valle, Giorgio ;
Chisholm, Rex L. ;
Fey, Petra ;
Gaudet, Pascale ;
Kibbe, Warren ;
Basu, Siddhartha ;
Bushmanova, Yulia ;
Eilbeck, Karen ;
Siegele, Deborah A. ;
McIntosh, Brenley ;
Renfro, Daniel ;
Zweifel, Adrienne ;
Hu, James C. ;
Ashburner, Michael ;
Tweedie, Susan ;
Alam-Faruque, Yasmin ;
Apweiler, Rolf ;
Auchinchloss, Andrea ;
Bairoch, Amos ;
Barrell, Daniel ;
Binns, David ;
Blatter, Marie-Claude ;
Bougueleret, Lydie ;
Boutet, Emmanuel ;
Breuza, Lionel ;
Bridge, Alan ;
Browne, Paul ;
Chan, Wei Mun ;
Coudert, Elizabeth ;
Daugherty, Louise ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Estreicher, Anne ;
Famiglietti, Livia ;
Ferro-Rojas, Serenella ;
Feuermann, Marc ;
Foulger, Rebecca ;
Gruaz-Gumowski, Nadine .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D331-D335
[4]   Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development [J].
Deegan , Jennifer I. ;
Dimmer, Emily C. ;
Mungall, Christopher J. .
BMC BIOINFORMATICS, 2010, 11 :530
[5]   The what, where, how and why of gene ontology-a primer for bioinformaticians [J].
du Plessis, Louis ;
Skunca, Nives ;
Dessimoz, Christophe .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (06) :723-735
[6]   A phylogenomic study of the MutS family of proteins [J].
Eisen, JA .
NUCLEIC ACIDS RESEARCH, 1998, 26 (18) :4291-4300
[7]   Protein molecular function prediction by Bayesian phylogenomics [J].
Engelhardt, BE ;
Jordan, MI ;
Muratore, KE ;
Brenner, SE .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (05) :432-445
[8]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[9]   The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species [J].
Gaudet, Pascale ;
Chisholm, Rex ;
Berardini, Tanya ;
Dimmer, Emily ;
Engel, Stacia R. ;
Fey, Petra ;
Hill, David P. ;
Howe, Doug ;
Hu, James C. ;
Huntley, Rachael ;
Khodiyar, Varsha K. ;
Kishore, Ranjana ;
Li, Donghui ;
Lovering, Ruth C. ;
McCarthy, Fiona ;
Ni, Li ;
Petri, Victoria ;
Siegele, Deborah A. ;
Tweedie, Susan ;
Van Auken, Kimberly ;
Wood, Valerie ;
Basu, Siddhartha ;
Carbon, Seth ;
Dolan, Mary ;
Mungall, Christopher J. ;
Dolinski, Kara ;
Thomas, Paul ;
Ashburner, Michael ;
Blake, Judith A. ;
Cherry, J. Michael ;
Lewis, Suzanna E. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (07)
[10]   The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists [J].
Heinicke, Sven ;
Livstone, Michael S. ;
Lu, Charles ;
Oughtred, Rose ;
Kang, Fan ;
Angiuoli, Samuel V. ;
White, Owen ;
Botstein, David ;
Dolinski, Kara .
PLOS ONE, 2007, 2 (08)