eggNOG v4.0: nested orthology inference across 3686 organisms

被引:482
作者
Powell, Sean [1 ]
Forslund, Kristoffer [1 ]
Szklarczyk, Damian [2 ,3 ]
Trachana, Kalliopi [4 ]
Roth, Alexander [2 ,3 ]
Huerta-Cepas, Jaime [5 ,6 ]
Gabaldon, Toni [5 ,6 ]
Rattei, Thomas [7 ]
Creevey, Chris [8 ]
Kuhn, Michael [9 ]
Jensen, Lars J. [10 ]
von Mering, Christian [2 ,3 ]
Bork, Peer [1 ,11 ]
机构
[1] European Mol Biol Lab, Computat Biol Unit, D-69117 Heidelberg, Germany
[2] Univ Zurich, CH-8057 Zurich, Switzerland
[3] Swiss Inst Bioinformat, Inst Mol Life Sci, CH-8057 Zurich, Switzerland
[4] Inst Syst Biol, Seattle, WA 98109 USA
[5] Ctr Genom Regulat, Bioinformat & Genom Programme, Barcelona 08003, Spain
[6] Univ Pompeu Fabra, Barcelona 08003, Spain
[7] Univ Vienna, Dept Microbiol & Ecosyst Sci, CUBE Div Computat Syst Biol, A-1090 Vienna, Austria
[8] Aberystwyth Univ, Inst Biol Environm & Rural Sci, Aberystwyth SY23 3FG, Ceredigion, Wales
[9] Tech Univ Dresden, Ctr Biotechnol, D-01062 Dresden, Germany
[10] Univ Copenhagen, Fac Hlth Sci, Novo Nordisk Fdn Ctr Prot Res, DK-2200 Copenhagen N, Denmark
[11] Max Delbruck Ctr Mol Med, D-13092 Berlin, Germany
基金
英国生物技术与生命科学研究理事会;
关键词
REFERENCE SEQUENCES REFSEQ; COMPARATIVE GENOMICS; DATABASE; GENES; ANNOTATION; TREE; RECONSTRUCTION; IDENTIFICATION; CONSTRUCTION; ARABIDOPSIS;
D O I
10.1093/nar/gkt1253
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, ( v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
引用
收藏
页码:D231 / D239
页数:9
相关论文
共 70 条
[1]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[2]   OMA 2011: orthology inference among 1000 complete genomes [J].
Altenhoff, Adrian M. ;
Schneider, Adrian ;
Gonnet, Gaston H. ;
Dessimoz, Christophe .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D289-D294
[3]   Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods [J].
Altenhoff, Adrian M. ;
Dessimoz, Christophe .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (01)
[4]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[5]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[6]  
[Anonymous], NUCLEIC ACIDS RES
[7]   Update on activities at the Universal Protein Resource (UniProt) in 2013 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuela ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dimmer, Emily ;
Fazzini, Francesco ;
Gane, Paul ;
Fedotov, Alexander ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Jacobsen, Julius ;
Jones, Rachel ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Patient, Samuel ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Sawford, Tony ;
Sehra, Harminder ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D43-D47
[8]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[9]   GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis [J].
Aurrecoechea, Cristina ;
Brestelli, John ;
Brunk, Brian P. ;
Carlton, Jane M. ;
Dommer, Jennifer ;
Fischer, Steve ;
Gajria, Bindu ;
Gao, Xin ;
Gingle, Alan ;
Grant, Greg ;
Harb, Omar S. ;
Heiges, Mark ;
Innamorato, Frank ;
Iodice, John ;
Kissinger, Jessica C. ;
Kraemer, Eileen ;
Li, Wei ;
Miller, John A. ;
Morrison, Hilary G. ;
Nayak, Vishal ;
Pennington, Cary ;
Pinney, Deborah F. ;
Roos, David S. ;
Ross, Chris ;
Stoeckert, Christian J., Jr. ;
Sullivan, Steven ;
Treatman, Charles ;
Wang, Haiming .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D526-D530
[10]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]