High precision multi-genome scale reannotation of enzyme function by EFICAz

被引:21
作者
Arakaki, Adrian K.
Tian, Weidong
Skolnick, Jeffrey [1 ]
机构
[1] Georgia Inst Technol, Sch Biol, Ctr Study Syst Biol, Atlanta, GA 30318 USA
[2] Harvard Univ, Sch Med, Dept Biol Chem & Mol Pharmacol, Boston, MA 02115 USA
关键词
D O I
10.1186/1471-2164-7-315
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. Results: Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96% (84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12).
引用
收藏
页数:18
相关论文
共 69 条
[1]   Alginate lyase enhances antibiotic killing of mucoid Pseudomonas aeruginosa in biofilms [J].
Alkawash, MA ;
Soothill, JS ;
Schiller, NL .
APMIS, 2006, 114 (02) :131-138
[2]   The emergence of vancomycin-intermediate and vancomycin-resistant Staphylococcus aureus [J].
Appelbaum, PC .
CLINICAL MICROBIOLOGY AND INFECTION, 2006, 12 :16-23
[3]   Staphylococcus aureus:: A well-armed pathogen [J].
Archer, GL .
CLINICAL INFECTIOUS DISEASES, 1998, 26 (05) :1179-1181
[4]   Genome and virulence determinants of high virulence community-acquired MRSA [J].
Baba, T ;
Takeuchi, F ;
Kuroda, M ;
Yuzawa, H ;
Aoki, K ;
Oguchi, A ;
Nagai, Y ;
Iwama, N ;
Asano, K ;
Naimi, T ;
Kuroda, H ;
Cui, L ;
Yamamoto, K ;
Hiramatsu, K .
LANCET, 2002, 359 (9320) :1819-1827
[5]   Definitions of enzyme function for the structural genomics era [J].
Babbitt, PC .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2003, 7 (02) :230-237
[6]   Binary bacterial toxins:: Biochemistry, biology, and applications of common Clostridium and Bacillus proteins [J].
Barth, H ;
Aktories, K ;
Popoff, MR ;
Stiles, BG .
MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, 2004, 68 (03) :373-+
[7]   A global analysis of Caenorhabditis elegans operons [J].
Blumenthal, T ;
Evans, D ;
Link, CD ;
Guffanti, A ;
Lawson, D ;
Thierry-Mieg, J ;
Thierry-Mieg, D ;
Chiu, WL ;
Duke, K ;
Kiraly, M ;
Kim, SK .
NATURE, 2002, 417 (6891) :851-854
[8]   Predicting functions from protein sequences - where are the bottlenecks? [J].
Bork, P ;
Koonin, EV .
NATURE GENETICS, 1998, 18 (04) :313-318
[9]   Errors in genome annotation [J].
Brenner, SE .
TRENDS IN GENETICS, 1999, 15 (04) :132-133
[10]   Inorganic polyphosphate in the origin and survival of species [J].
Brown, MRW ;
Kornberg, A .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (46) :16085-16087