Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits

被引:43
作者
Dessimoz, Christophe [1 ]
Boeckmann, Brigitte
Roth, Alexander C. J.
Gonnet, Gaston H.
机构
[1] ETH, Inst Computat Sci, CH-8092 Zurich, Switzerland
[2] CMU, Swiss Inst Bioinformat, CH-1211 Geneva, Switzerland
关键词
D O I
10.1093/nar/gkl433
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
引用
收藏
页码:3309 / 3316
页数:8
相关论文
共 30 条
  • [1] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [2] The Escherichia coli RNA degradosome:: structure, function and relationship to other ribonucleolytic multienyzme complexes
    Carpousis, AJ
    [J]. BIOCHEMICAL SOCIETY TRANSACTIONS, 2002, 30 : 150 - 155
  • [3] The DEAD-box RNA helicase SrmB is involved in the assembly of 50S ribosomal subunits in Escherichia coli
    Charollais, J
    Pflieger, D
    Vinh, J
    Dreyfus, M
    Iost, I
    [J]. MOLECULAR MICROBIOLOGY, 2003, 48 (05) : 1253 - 1265
  • [4] Multiple sequence alignment with the Clustal series of programs
    Chenna, R
    Sugawara, H
    Koike, T
    Lopez, R
    Gibson, TJ
    Higgins, DG
    Thompson, JD
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3497 - 3500
  • [5] Dessimoz C, 2005, LECT NOTES COMPUT SC, V3678, P61
  • [6] Escherichia coli DbPA is a 3′ → 5′ RNA helicase
    Diges, CM
    Uhlenbeck, OC
    [J]. BIOCHEMISTRY, 2005, 44 (21) : 7903 - 7911
  • [7] CONVERGENT EVOLUTION - THE NEED TO BE EXPLICIT
    DOOLITTLE, RF
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1994, 19 (01) : 15 - 18
  • [8] MUSCLE: a multiple sequence alignment method with reduced time and space complexity
    Edgar, RC
    [J]. BMC BIOINFORMATICS, 2004, 5 (1) : 1 - 19
  • [9] Felsenstein J., 1993, PHYLIP PHYLOGENY INF
  • [10] DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS
    FITCH, WM
    [J]. SYSTEMATIC ZOOLOGY, 1970, 19 (02): : 99 - &