Estimating the number of protein folds and families from complete genome data

被引:143
作者
Wolf, YI
Grishin, NV
Koonin, EV
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Russian Acad Sci, Inst Cytol & Genet, Novosibirsk 630090, Russia
关键词
protein structure classification; structural genomics; sampling; logarithmic distribution;
D O I
10.1006/jmbi.2000.3786
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water-soluble proteins is estimated at about 1000, with structural information currently available for about one-third of that number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds, The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these. (C) 2000 Academic Press.
引用
收藏
页码:897 / 905
页数:9
相关论文
共 45 条
  • [1] Do aligned sequences share the same fold?
    Abagyan, RA
    Batalov, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) : 355 - 368
  • [2] Protein data bank archives of three-dimensional macromolecular structures
    Abola, EE
    Sussman, JL
    Prilusky, J
    Manning, NO
    [J]. MACROMOLECULAR CRYSTALLOGRAPHY, PT B, 1997, 277 : 556 - 571
  • [3] ALEXANDROV NN, 1994, PROTEIN SCI, V3, P866
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] [Anonymous], INTELL SYST MOL BIOL
  • [6] Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches
    Aravind, L
    Koonin, EV
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 287 (05) : 1023 - 1040
  • [7] CATCHING A COMMON FOLD
    BLUNDELL, TL
    JOHNSON, MS
    [J]. PROTEIN SCIENCE, 1993, 2 (06) : 877 - 883
  • [8] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170
  • [9] Population statistics of protein structures: Lessons from structural classifications
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) : 369 - 376
  • [10] PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST
    CHOTHIA, C
    [J]. NATURE, 1992, 357 (6379) : 543 - 544