The COG database: an updated version includes eukaryotes

被引:3515
作者
Tatusov, RL [1 ]
Fedorova, ND
Jackson, JD
Jacobs, AR
Kiryutin, B
Koonin, EV
Krylov, DM
Mazumder, R
Mekhedov, SL
Nikolskaya, AN
Rao, BS
Smirnov, S
Sverdlov, AV
Vasudevan, S
Wolf, YI
Yin, JJ
Natale, DA
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[2] Georgetown Univ, Med Ctr, Washington, DC 20007 USA
关键词
D O I
10.1186/1471-2105-4-41
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Results: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 ( predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups ( KOGs) include proteins from 7 eukaryotic genomes: three animals ( the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi ( Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or -54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of -20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (-1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. Conclusion: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.
引用
收藏
页数:14
相关论文
共 60 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], 2000, Nature
  • [4] Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes
    Aparicio, S
    Chapman, J
    Stupka, E
    Putnam, N
    Chia, J
    Dehal, P
    Christoffels, A
    Rash, S
    Hoon, S
    Smit, A
    Gelpke, MDS
    Roach, J
    Oh, T
    Ho, IY
    Wong, M
    Detter, C
    Verhoef, F
    Predki, P
    Tay, A
    Lucas, S
    Richardson, P
    Smith, SF
    Clark, MS
    Edwards, YJK
    Doggett, N
    Zharkikh, A
    Tavtigian, SV
    Pruss, D
    Barnstead, M
    Evans, C
    Baden, H
    Powell, J
    Glusman, G
    Rowen, L
    Hood, L
    Tan, YH
    Elgar, G
    Hawkins, T
    Venkatesh, B
    Rokhsar, D
    Brenner, S
    [J]. SCIENCE, 2002, 297 (5585) : 1301 - 1310
  • [5] Genome sequence of the nematode C-elegans:: A platform for investigating biology
    不详
    [J]. SCIENCE, 1998, 282 (5396) : 2012 - 2018
  • [6] Casjens S, 2000, J MOL MICROB BIOTECH, V2, P401
  • [7] The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins
    Dehal, P
    Satou, Y
    Campbell, RK
    Chapman, J
    Degnan, B
    De Tomaso, A
    Davidson, B
    Di Gregorio, A
    Gelpke, M
    Goodstein, DM
    Harafuji, N
    Hastings, KEM
    Ho, I
    Hotta, K
    Huang, W
    Kawashima, T
    Lemaire, P
    Martinez, D
    Meinertzhagen, IA
    Necula, S
    Nonaka, M
    Putnam, N
    Rash, S
    Saiga, H
    Satake, M
    Terry, A
    Yamada, L
    Wang, HG
    Awazu, S
    Azumi, K
    Boore, J
    Branno, M
    Chin-bow, S
    DeSantis, R
    Doyle, S
    Francino, P
    Keys, DN
    Haga, S
    Hayashi, H
    Hino, K
    Imai, KS
    Inaba, K
    Kano, S
    Kobayashi, K
    Kobayashi, M
    Lee, BI
    Makabe, KW
    Manohar, C
    Matassi, G
    Medina, M
    [J]. SCIENCE, 2002, 298 (5601) : 2157 - 2167
  • [8] Homology - a personal view on some of the problems
    Fitch, WM
    [J]. TRENDS IN GENETICS, 2000, 16 (05) : 227 - 231
  • [9] DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS
    FITCH, WM
    [J]. SYSTEMATIC ZOOLOGY, 1970, 19 (02): : 99 - &
  • [10] Genome sequence of the human malaria parasite Plasmodium falciparum
    Gardner, MJ
    Hall, N
    Fung, E
    White, O
    Berriman, M
    Hyman, RW
    Carlton, JM
    Pain, A
    Nelson, KE
    Bowman, S
    Paulsen, IT
    James, K
    Eisen, JA
    Rutherford, K
    Salzberg, SL
    Craig, A
    Kyes, S
    Chan, MS
    Nene, V
    Shallom, SJ
    Suh, B
    Peterson, J
    Angiuoli, S
    Pertea, M
    Allen, J
    Selengut, J
    Haft, D
    Mather, MW
    Vaidya, AB
    Martin, DMA
    Fairlamb, AH
    Fraunholz, MJ
    Roos, DS
    Ralph, SA
    McFadden, GI
    Cummings, LM
    Subramanian, GM
    Mungall, C
    Venter, JC
    Carucci, DJ
    Hoffman, SL
    Newbold, C
    Davis, RW
    Fraser, CM
    Barrell, B
    [J]. NATURE, 2002, 419 (6906) : 498 - 511