Quantitative assessment of protein function prediction from metagenomics shotgun sequences

被引:52
作者
Harrington, E. D.
Singh, A. H.
Doerks, T.
Letunic, I.
von Mering, C.
Jensen, L. J.
Raes, J.
Bork, P.
机构
[1] European Mol Biol Lab, Struct & Computat Biol Unit, D-69117 Heidelberg, Germany
[2] Max Delbruck Ctr Mol Med, D-13092 Berlin, Germany
关键词
fatty acid; heme; neighborhood; environmental genomics; metagenome annotation;
D O I
10.1073/pnas.0702636104
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.
引用
收藏
页码:13913 / 13918
页数:6
相关论文
共 46 条
[31]   Protein function space: viewing the limits or limited by our view? [J].
Raes, Jeroen ;
Harrington, Eoghan Donal ;
Singh, Amoolya Hardev ;
Bork, Peer .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2007, 17 (03) :362-369
[32]   Prediction of effective genome size in metagenomic samples [J].
Raes, Jeroen ;
Korbel, Jan O. ;
Lercher, Martin J. ;
von Mering, Christian ;
Bork, Peer .
GENOME BIOLOGY, 2007, 8 (01)
[33]   Operons in Escherichia coli:: Genomic analyses and predictions [J].
Salgado, H ;
Moreno-Hagelsieb, G ;
Smith, TF ;
Collado-Vides, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (12) :6652-6657
[34]   The COG database: an updated version includes eukaryotes [J].
Tatusov, RL ;
Fedorova, ND ;
Jackson, JD ;
Jacobs, AR ;
Kiryutin, B ;
Koonin, EV ;
Krylov, DM ;
Mazumder, R ;
Mekhedov, SL ;
Nikolskaya, AN ;
Rao, BS ;
Smirnov, S ;
Sverdlov, AV ;
Vasudevan, S ;
Wolf, YI ;
Yin, JJ ;
Natale, DA .
BMC BIOINFORMATICS, 2003, 4 (1)
[35]   Microbial diversity and function in soil:: from genes to ecosystems [J].
Torsvik, V ;
Ovreås, L .
CURRENT OPINION IN MICROBIOLOGY, 2002, 5 (03) :240-245
[36]   Comparative metagenomics of microbial communities [J].
Tringe, SG ;
von Mering, C ;
Kobayashi, A ;
Salamov, AA ;
Chen, K ;
Chang, HW ;
Podar, M ;
Short, JM ;
Mathur, EJ ;
Detter, JC ;
Bork, P ;
Hugenholtz, P ;
Rubin, EM .
SCIENCE, 2005, 308 (5721) :554-557
[37]   An obesity-associated gut microbiome with increased capacity for energy harvest [J].
Turnbaugh, Peter J. ;
Ley, Ruth E. ;
Mahowald, Michael A. ;
Magrini, Vincent ;
Mardis, Elaine R. ;
Gordon, Jeffrey I. .
NATURE, 2006, 444 (7122) :1027-1031
[38]   Community structure and metabolism through reconstruction of microbial genomes from the environment [J].
Tyson, GW ;
Chapman, J ;
Hugenholtz, P ;
Allen, EE ;
Ram, RJ ;
Richardson, PM ;
Solovyev, VV ;
Rubin, EM ;
Rokhsar, DS ;
Banfield, JF .
NATURE, 2004, 428 (6978) :37-43
[39]  
van Dongen S., 2000, A cluster algorithm for graphs
[40]   Environmental genome shotgun sequencing of the Sargasso Sea [J].
Venter, JC ;
Remington, K ;
Heidelberg, JF ;
Halpern, AL ;
Rusch, D ;
Eisen, JA ;
Wu, DY ;
Paulsen, I ;
Nelson, KE ;
Nelson, W ;
Fouts, DE ;
Levy, S ;
Knap, AH ;
Lomas, MW ;
Nealson, K ;
White, O ;
Peterson, J ;
Hoffman, J ;
Parsons, R ;
Baden-Tillson, H ;
Pfannkoch, C ;
Rogers, YH ;
Smith, HO .
SCIENCE, 2004, 304 (5667) :66-74