Thoroughly sampling sequence space: Large-scale protein design of structural ensembles

被引:89
作者
Larson, SM
England, JL
Desjarlais, JR
Pande, VS [1 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Biophys Program, Stanford, CA 94305 USA
[3] Harvard Univ, Biochem Sci Program, Cambridge, MA 02138 USA
[4] Xencor Inc, Monrovia, CA 91016 USA
关键词
protein design; sequence space; designability; backbone flexibility; distributed computing;
D O I
10.1110/ps.0203902
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.
引用
收藏
页码:2804 / 2813
页数:10
相关论文
共 54 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] THE ROLE OF BACKBONE FLEXIBILITY IN THE ACCOMMODATION OF VARIANTS THAT REPACK THE CORE OF T4-LYSOZYME
    BALDWIN, EP
    HAJISEYEDJAVADI, O
    BAASE, WA
    MATTHEWS, BW
    [J]. SCIENCE, 1993, 262 (5140) : 1715 - 1718
  • [3] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] Improved biocatalysts by directed evolution and rational protein design
    Bornscheuer, UT
    Pohl, M
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2001, 5 (02) : 137 - 143
  • [6] Population statistics of protein structures: Lessons from structural classifications
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) : 369 - 376
  • [7] Buchler NEG, 1999, PROTEINS, V34, P113
  • [8] Tailoring new enzyme functions by rational redesign
    Cedrone, F
    Ménez, A
    Quéméneur, E
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2000, 10 (04) : 405 - 410
  • [9] PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST
    CHOTHIA, C
    [J]. NATURE, 1992, 357 (6379) : 543 - 544
  • [10] In silico design for protein stabilization
    Dahiyat, BI
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 1999, 10 (04) : 387 - 390