OrthoMCL: Identification of ortholog groups for eukaryotic genomes

被引:4492
作者
Li, L
Stoeckert, CJ
Roos, DS [1 ]
机构
[1] Univ Penn, Ctr Bioinformat, Dept Biol, Philadelphia, PA 19104 USA
[2] Univ Penn, Ctr Bioinformat, Dept Genet, Philadelphia, PA 19104 USA
[3] Univ Penn, Genom Inst, Philadelphia, PA 19104 USA
关键词
D O I
10.1101/gr.1224503
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating A falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.
引用
收藏
页码:2178 / 2189
页数:12
相关论文
共 35 条
  • [21] Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA)
    Lee, Y
    Sultana, R
    Pertea, G
    Cho, J
    Karamycheva, S
    Tsai, J
    Parvizi, B
    Cheung, F
    Antonescu, V
    White, J
    Holt, I
    Liang, F
    Quackenbush, J
    [J]. GENOME RESEARCH, 2002, 12 (03) : 493 - 502
  • [22] Large-scale taxonomic profiling of eukaryotic model organisms: A comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes
    Mushegian, AR
    Garey, JR
    Martin, J
    Liu, LX
    [J]. GENOME RESEARCH, 1998, 8 (06) : 590 - 598
  • [23] Natale DA, 2000, GENOME BIOL, V1
  • [24] The TIGR Gene Indices: reconstruction and representation of expressed gene sequences
    Quackenbush, J
    Liang, F
    Holt, I
    Pertea, G
    Upton, J
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 141 - 145
  • [25] The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species
    Quackenbush, J
    Cho, J
    Lee, D
    Liang, F
    Holt, I
    Karamycheva, S
    Parvizi, B
    Pertea, G
    Sultana, R
    White, J
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 159 - 164
  • [26] Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
    Remm, M
    Storm, CEV
    Sonnhammer, ELL
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 314 (05) : 1041 - 1052
  • [27] The apicoplast as a potential therapeutic target in Toxoplasma and other apicomplexan parasites:: Some additional thoughts
    Roos, DS
    [J]. PARASITOLOGY TODAY, 1999, 15 (01): : 41 - 41
  • [28] Comparative genomics of the eukaryotes
    Rubin, GM
    Yandell, MD
    Wortman, JR
    Miklos, GLG
    Nelson, CR
    Hariharan, IK
    Fortini, ME
    Li, PW
    Apweiler, R
    Fleischmann, W
    Cherry, JM
    Henikoff, S
    Skupski, MP
    Misra, S
    Ashburner, M
    Birney, E
    Boguski, MS
    Brody, T
    Brokstein, P
    Celniker, SE
    Chervitz, SA
    Coates, D
    Cravchik, A
    Gabrielian, A
    Galle, RF
    Gelbart, WM
    George, RA
    Goldstein, LSB
    Gong, FC
    Guan, P
    Harris, NL
    Hay, BA
    Hoskins, RA
    Li, JY
    Li, ZY
    Hynes, RO
    Jones, SJM
    Kuehl, PM
    Lemaitre, B
    Littleton, JT
    Morrison, DK
    Mungall, C
    O'Farrell, PH
    Pickeral, OK
    Shue, C
    Vosshall, LB
    Zhang, J
    Zhao, Q
    Zheng, XQH
    Zhong, F
    [J]. SCIENCE, 2000, 287 (5461) : 2204 - 2215
  • [29] Predicting Gene Ontology functions from ProDom and CDD protein domains
    Schug, J
    Diskin, S
    Mazzarelli, J
    Brunk, BP
    Stoeckert, CJ
    [J]. GENOME RESEARCH, 2002, 12 (04) : 648 - 655
  • [30] Normalized cuts and image segmentation
    Shi, JB
    Malik, J
    [J]. 1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 731 - 737