eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

被引:181
作者
Muller, J. [1 ]
Szklarczyk, D. [1 ,2 ]
Julien, P. [3 ]
Letunic, I. [1 ]
Roth, A. [4 ,5 ]
Kuhn, M. [1 ]
Powell, S. [1 ]
von Mering, C. [4 ,5 ]
Doerks, T. [1 ]
Jensen, L. J. [2 ]
Bork, P. [1 ,6 ]
机构
[1] European Mol Biol Lab, D-69117 Heidelberg, Germany
[2] Univ Copenhagen, Fac Hlth Sci, Novo Nordisk Fdn, Ctr Prot Res, DK-2200 Copenhagen N, Denmark
[3] Univ Lausanne, Ctr Integrat Genom, Lausanne, Switzerland
[4] Univ Zurich, CH-8057 Zurich, Switzerland
[5] Swiss Inst Bioinformat, CH-8057 Zurich, Switzerland
[6] Max Delbruck Ctr Mol Med, D-13092 Berlin, Germany
关键词
MULTIPLE SEQUENCE ALIGNMENT; DATABASE; ALGORITHM; RESOURCE; CLUSTERS; DISPLAY; TREE;
D O I
10.1093/nar/gkp951
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the nonsupervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.
引用
收藏
页码:D190 / D195
页数:6
相关论文
共 43 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], CURR PROTOC BIOINFOR
[3]  
[Anonymous], 2005, PHYLIP (phylogeny inference package) version 3.6
[4]   GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis [J].
Aurrecoechea, Cristina ;
Brestelli, John ;
Brunk, Brian P. ;
Carlton, Jane M. ;
Dommer, Jennifer ;
Fischer, Steve ;
Gajria, Bindu ;
Gao, Xin ;
Gingle, Alan ;
Grant, Greg ;
Harb, Omar S. ;
Heiges, Mark ;
Innamorato, Frank ;
Iodice, John ;
Kissinger, Jessica C. ;
Kraemer, Eileen ;
Li, Wei ;
Miller, John A. ;
Morrison, Hilary G. ;
Nayak, Vishal ;
Pennington, Cary ;
Pinney, Deborah F. ;
Roos, David S. ;
Ross, Chris ;
Stoeckert, Christian J., Jr. ;
Sullivan, Steven ;
Treatman, Charles ;
Wang, Haiming .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D526-D530
[5]   The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[6]   InParanoid 6:: eukaryotic ortholog clusters with inparalogs [J].
Berglund, Ann-Charlotte ;
Sjolund, Erik ;
Ostlund, Gabriel ;
Sonnhammer, Erik L. L. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D263-D266
[7]   Berkeley PHOG: PhyloFacts orthology group prediction web server [J].
Datta, Ruchira S. ;
Meacham, Christopher ;
Samad, Bushra ;
Neyer, Christoph ;
Sjolander, Kimmen .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W84-W89
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]   HCOP: a searchable database of human orthology predictions [J].
Eyre, Tina A. ;
Wright, Mathew W. ;
Lush, Michael J. ;
Bruford, Elspeth A. .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) :2-5
[10]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288