A structural census of the current population of protein sequences

被引:71
作者
Gerstein, M [1 ]
Levitt, M [1 ]
机构
[1] STANFORD UNIV, DEPT BIOL STRUCT, STANFORD, CA 94305 USA
关键词
sequence analysis; genome comparison; fold family; databank statistics; protein evolution;
D O I
10.1073/pnas.94.22.11911
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We examine the occurrence of the approximate to 300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (approximate to 140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds, So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria, Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.
引用
收藏
页码:11911 / 11916
页数:6
相关论文
共 53 条
  • [1] WEIGHTS FOR DATA RELATED BY A TREE
    ALTSCHUL, SF
    CARROLL, RJ
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1989, 207 (04) : 647 - 653
  • [2] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 : 2019 - 2022
  • [3] GenBank
    Benson, DA
    Boguski, M
    Lipman, DJ
    Ostell, J
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 1 - 5
  • [4] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [5] BLEASBY AJ, 1994, NUCLEIC ACIDS RES, V22, P3574
  • [6] CONSTRUCTION OF VALIDATED, NONREDUNDANT COMPOSITE PROTEIN-SEQUENCE DATABASES
    BLEASBY, AJ
    WOOTTON, JC
    [J]. PROTEIN ENGINEERING, 1990, 3 (03): : 153 - 159
  • [7] INVERTED PROTEIN-STRUCTURE PREDICTION
    BOWIE, JU
    EISENBERG, D
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1993, 3 (03) : 437 - 444
  • [8] BRENNER S, 1997, IN PRESS P NATL ACAD
  • [9] GENE DUPLICATIONS IN HAEMOPHILUS-INFLUENZAE
    BRENNER, SE
    HUBBARD, T
    MURZIN, A
    CHOTHIA, C
    [J]. NATURE, 1995, 378 (6553) : 140 - 140
  • [10] Brenner SE, 1996, METHOD ENZYMOL, V266, P635