Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

被引:289
作者
Altenhoff, Adrian M. [1 ]
Dessimoz, Christophe
机构
[1] Swiss Fed Inst Technol, Inst Computat Sci, Zurich, Switzerland
关键词
IDENTIFICATION; DATABASE; CONSERVATION; SEQUENCE; GENOMICS; CLUSTERS; OMA;
D O I
10.1371/journal.pcbi.1000262
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.
引用
收藏
页数:11
相关论文
共 42 条
  • [21] Benchmarking ortholog identification methods using functional genomics data
    Hulsen, Tim
    Huynen, Martijn A.
    de Vlieg, Jacob
    Groenen, Peter M. A.
    [J]. GENOME BIOLOGY, 2006, 7 (04)
  • [22] eggNOG: automated construction and annotation of orthologous groups of genes
    Jensen, Lars Juhl
    Julien, Philippe
    Kuhn, Michael
    von Mering, Christian
    Muller, Jean
    Doerks, Tobias
    Bork, Peer
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D250 - D254
  • [23] OrthoMCL: Identification of ortholog groups for eukaryotic genomes
    Li, L
    Stoeckert, CJ
    Roos, DS
    [J]. GENOME RESEARCH, 2003, 13 (09) : 2178 - 2189
  • [24] Evolutionary conservation of expression profiles between human and mouse orthologous genes
    Liao, BY
    Zhang, JZ
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (03) : 530 - 540
  • [25] Lin D., 1998, P 15 INT C MACH LEAR
  • [26] Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation
    Lord, PW
    Stevens, RD
    Brass, A
    Goble, CA
    [J]. BIOINFORMATICS, 2003, 19 (10) : 1275 - 1283
  • [27] Genomic distances under deletions and insertions
    Marron, M
    Swenson, KM
    Moret, BME
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 325 (03) : 347 - 360
  • [28] Correlation between sequence conservation and the genomic context after gene duplication
    Notebaart, RA
    Huynen, MA
    Teusink, B
    Siezen, RJ
    Snel, B
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (19) : 6164 - 6171
  • [29] Orthology: another terminology muddle
    Ouzounis, C
    [J]. TRENDS IN GENETICS, 1999, 15 (11) : 445 - 445
  • [30] The use of gene clusters to infer functional coupling
    Overbeek, R
    Fonstein, M
    D'Souza, M
    Pusch, GD
    Maltsev, N
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (06) : 2896 - 2901