Synergistic use of plant-prokaryote comparative genomics for functional annotations

被引:29
作者
Gerdes, Svetlana [2 ,3 ]
El Yacoubi, Basma [2 ]
Bailly, Marc [2 ]
Blaby, Ian K. [2 ]
Blaby-Haas, Crysten E. [2 ]
Jeanguenin, Linda [1 ]
Lara-Nunez, Aurora [1 ]
Pribat, Anne [1 ]
Waller, Jeffrey C. [1 ]
Wilke, Andreas [4 ]
Overbeek, Ross [3 ]
Hanson, Andrew D. [1 ]
de Crecy-Lagard, Valerie [2 ]
机构
[1] Univ Florida, Dept Hort Sci, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Microbiol & Cell Sci, Gainesville, FL 32611 USA
[3] Fellowship Interpretat Genomes, Burr Ridge, IL USA
[4] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
来源
BMC GENOMICS | 2011年 / 12卷
关键词
MOLECULAR-IDENTIFICATION; OMEGA-AMIDASE; PROTEIN; DATABASE; GENES; ARABIDOPSIS; EXPRESSION; ENZYME; FAMILY; BIOSYNTHESIS;
D O I
10.1186/1471-2164-12-S1-S2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations. Results: Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) occur in prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology-independent characteristics associated in the SEED database with the prokaryotic members of each family. In-depth comparative genomic analysis was performed for 360 top candidate families. From this pool, 78 families were connected to general areas of metabolism and, of these families, specific functional predictions were made for 41. Twenty-one predicted functions have been experimentally tested or are currently under investigation by our group in at least one prokaryotic organism (nine of them have been validated, four invalidated, and eight are in progress). Ten additional predictions have been independently validated by other groups. Discovering the function of very widespread but hitherto enigmatic proteins such as the YrdC or YgfZ families illustrates the power of our approach. Conclusions: Our approach correctly predicted functions for 19 uncharacterized protein families from plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The resulting annotations could be propagated with confidence to over six thousand homologous proteins encoded in over 900 bacterial, archaeal, and eukaryotic genomes currently available in public databases.
引用
收藏
页数:16
相关论文
共 96 条
[51]   GMD@CSB.DB:: the Golm Metabolome Database [J].
Kopka, J ;
Schauer, N ;
Krueger, S ;
Birkemeyer, C ;
Usadel, B ;
Bergmüller, E ;
Dörmann, P ;
Weckwerth, W ;
Gibon, Y ;
Stitt, M ;
Willmitzer, L ;
Fernie, AR ;
Steinhauser, D .
BIOINFORMATICS, 2005, 21 (08) :1635-1638
[52]   Identification of the putative tumor suppressor Nit2 as ω-amidase, an enzyme metabolically linked to glutamine and asparagine transamination [J].
Krasnikov, Boris F. ;
Chien, Chin-Hsiang ;
Nostramo, Regina ;
Pinto, John T. ;
Nieves, Edward ;
Callaway, Myrasol ;
Sun, Jin ;
Huebner, Kay ;
Cooper, Arthur J. L. .
BIOCHIMIE, 2009, 91 (09) :1072-1080
[53]   Clustal W and clustal X version 2.0 [J].
Larkin, M. A. ;
Blackshields, G. ;
Brown, N. P. ;
Chenna, R. ;
McGettigan, P. A. ;
McWilliam, H. ;
Valentin, F. ;
Wallace, I. M. ;
Wilm, A. ;
Lopez, R. ;
Thompson, J. D. ;
Gibson, T. J. ;
Higgins, D. G. .
BIOINFORMATICS, 2007, 23 (21) :2947-2948
[54]   Oryza Tag Line, a phenotypic mutant database for the Genoplante rice insertion line library [J].
Larmande, Pierre ;
Gay, Celine ;
Lorieux, Mathias ;
Perin, Christophe ;
Bouniol, Matthieu ;
Droc, Gaetan ;
Sallaud, Christophe ;
Perez, Pascual ;
Barnola, Isabelle ;
Biderre-Petit, Corinne ;
Martin, Jerome ;
Morel, Jean Benoit ;
Johnson, Alexander A. T. ;
Bourgis, Fabienne ;
Ghesquiere, Alain ;
Ruiz, Manuel ;
Courtois, Brigitte ;
Guiderdoni, Emmanuel .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D1022-D1027
[55]  
Lawrence JG, 1996, GENETICS, V143, P1843
[56]   Predicting protein function from sequence and structure [J].
Lee, David ;
Redfern, Oliver ;
Orengo, Christine .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2007, 8 (12) :995-1005
[57]   Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana [J].
Lee, Insuk ;
Ambaru, Bindu ;
Thakkar, Pranjali ;
Marcotte, Edward M. ;
Rhee, Seung Y. .
NATURE BIOTECHNOLOGY, 2010, 28 (02) :149-U14
[58]   Genomic gene clustering analysis of pathways in eukaryotes [J].
Lee, JM ;
Sonnhammer, ELL .
GENOME RESEARCH, 2003, 13 (05) :875-882
[59]   Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus [J].
Martin, W ;
Rujan, T ;
Richly, E ;
Hansen, A ;
Cornelsen, S ;
Lins, T ;
Leister, D ;
Stoebe, B ;
Hasegawa, M ;
Penny, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (19) :12246-12251
[60]  
Misra S., 2002, GENOME BIOL, V3, DOI [DOI 10.1186/GB-2002-3-12-RESEARCH0083, 10.1186/gb-2002-3-12-research0083]