Protein 3D Structure Computed from Evolutionary Sequence Variation

被引:784
作者
Marks, Debora S. [1 ]
Colwell, Lucy J. [2 ]
Sheridan, Robert [3 ]
Hopf, Thomas A. [1 ]
Pagnani, Andrea [4 ]
Zecchina, Riccardo [4 ,5 ]
Sander, Chris [3 ]
机构
[1] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02114 USA
[2] MRC Lab Mol Biol, Cambridge, England
[3] Mem Sloan Kettering Canc Ctr, Computat Biol Ctr, New York, NY 10021 USA
[4] Human Genet Fdn, Turin, Italy
[5] Politecn Torino, Turin, Italy
来源
PLOS ONE | 2011年 / 6卷 / 12期
基金
英国工程与自然科学研究理事会;
关键词
ALL-ATOM REFINEMENT; STRUCTURE PREDICTION; RESIDUE CONTACTS; NETWORKS; CONFORMATION; INFORMATION; BOTTLENECKS; RESOLUTION; PROGRESS; DYNAMICS;
D O I
10.1371/journal.pone.0028766
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 A Ca-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
引用
收藏
页数:20
相关论文
共 80 条
[1]   COORDINATED AMINO-ACID CHANGES IN HOMOLOGOUS PROTEIN FAMILIES [J].
ALTSCHUH, D ;
VERNET, T ;
BERTI, P ;
MORAS, D ;
NAGAI, K .
PROTEIN ENGINEERING, 1988, 2 (03) :193-199
[2]   CORRELATION OF COORDINATED AMINO-ACID SUBSTITUTIONS WITH FUNCTION IN VIRUSES RELATED TO TOBACCO MOSAIC-VIRUS [J].
ALTSCHUH, D ;
LESK, AM ;
BLOOMER, AC ;
KLUG, A .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :693-707
[3]  
[Anonymous], MONOGRAPH SERIES I M
[4]  
[Anonymous], ARXIV11105223V2QBIOQ
[5]  
[Anonymous], 2007, PLOS BIOL, DOI DOI 10.1371/journal.pbio.0050016
[6]   Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis [J].
Atchley, WR ;
Wollenberg, KR ;
Fitch, WM ;
Terhalle, W ;
Dress, AW .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :164-178
[7]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[8]   Toward high-resolution de novo structure prediction for small proteins [J].
Bradley, P ;
Misura, KMS ;
Baker, D .
SCIENCE, 2005, 309 (5742) :1868-1871
[9]   Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation [J].
Bradley, P ;
Chivian, D ;
Meiler, J ;
Misura, KMS ;
Rohl, CA ;
Schief, WR ;
Wedemeyer, WJ ;
Schueler-Furman, O ;
Murphy, P ;
Schonbrun, J ;
Strauss, CEM ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :457-468
[10]   Crystallography & NMR system:: A new software suite for macromolecular structure determination [J].
Brunger, AT ;
Adams, PD ;
Clore, GM ;
DeLano, WL ;
Gros, P ;
Grosse-Kunstleve, RW ;
Jiang, JS ;
Kuszewski, J ;
Nilges, M ;
Pannu, NS ;
Read, RJ ;
Rice, LM ;
Simonson, T ;
Warren, GL .
ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY, 1998, 54 :905-921