The COG database: new developments in phylogenetic classification of proteins from complete genomes

被引:1547
作者
Tatusov, RL [1 ]
Natale, DA [1 ]
Garkavtsev, IV [1 ]
Tatusova, TA [1 ]
Shankavaram, UT [1 ]
Rao, BS [1 ]
Kiryutin, B [1 ]
Galperin, MY [1 ]
Fedorova, ND [1 ]
Koonin, EV [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
关键词
D O I
10.1093/nar/29.1.22
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih,gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.
引用
收藏
页码:22 / 28
页数:7
相关论文
共 11 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Genome sequence of the nematode C-elegans:: A platform for investigating biology
    不详
    [J]. SCIENCE, 1998, 282 (5396) : 2012 - 2018
  • [4] Doolittle WE, 1998, TRENDS GENET, V14, P307
  • [5] DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS
    FITCH, WM
    [J]. SYSTEMATIC ZOOLOGY, 1970, 19 (02): : 99 - &
  • [6] Kawarabayasi Y, 1999, DNA Res, V6, P83, DOI 10.1093/dnares/6.2.83
  • [7] Using the COG database to improve gene recognition in complete genomes
    Natale, DA
    Galperin, MY
    Tatusov, RL
    Koonin, EV
    [J]. GENETICA, 2000, 108 (01) : 9 - 17
  • [8] Natale DA, 2000, GENOME BIOL, V1
  • [9] A genomic perspective on protein families
    Tatusov, RL
    Koonin, EV
    Lipman, DJ
    [J]. SCIENCE, 1997, 278 (5338) : 631 - 637
  • [10] The COG database: a tool for genome-scale analysis of protein functions and evolution
    Tatusov, RL
    Galperin, MY
    Natale, DA
    Koonin, EV
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 33 - 36