Direct-coupling analysis of residue coevolution captures native contacts across many protein families

被引:1010
作者
Morcos, Faruck [1 ]
Pagnani, Andrea [2 ]
Lunt, Bryan [1 ]
Bertolino, Arianna [3 ]
Marks, Debora S. [4 ]
Sander, Chris [5 ]
Zecchina, Riccardo [2 ,6 ,7 ]
Onuchic, Jose N. [1 ,8 ]
Hwa, Terence [1 ]
Weigt, Martin [2 ,9 ]
机构
[1] Univ Calif San Diego, Ctr Theoret Biol Phys, La Jolla, CA 92093 USA
[2] Human Genet Fdn, I-10126 Turin, Italy
[3] Inst Sci Interchange, I-10133 Turin, Italy
[4] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
[5] Mem Sloan Kettering Canc Ctr, Computat Biol Ctr, New York, NY 10065 USA
[6] Politecn Torino, Ctr Computat Studies, I-10129 Turin, Italy
[7] Politecn Torino, Dipartimento Fis, I-10129 Turin, Italy
[8] Rice Univ, Ctr Theoret Biol Phys, Houston, TX 77005 USA
[9] Univ Paris 06, Unite Mixte Rech 7238, Lab Genom Microorganismes, F-75006 Paris, France
基金
美国国家科学基金会;
关键词
statistical sequence analysis; residue-residue covariation; contact map prediction; maximum-entropy modeling; CRYSTAL-STRUCTURE; LONG-RANGE; INFORMATION-THEORY; LIGAND-BINDING; 2-COMPONENT; MECHANISM; TRANSPORTER; RESOLUTION; COMPLEXES; DOMAIN;
D O I
10.1073/pnas.1111471108
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.
引用
收藏
页码:E1293 / E1301
页数:9
相关论文
共 50 条
[1]   Crystal structure of the membrane fusion protein, MexA, of the multidrug transporter in Pseudomonas aeruginosa [J].
Akama, H ;
Matsuura, T ;
Kashiwagi, S ;
Yoneyama, H ;
Narita, SI ;
Tsukihara, T ;
Nakagawa, A ;
Nakae, T .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (25) :25939-25942
[2]   CORRELATION OF COORDINATED AMINO-ACID SUBSTITUTIONS WITH FUNCTION IN VIRUSES RELATED TO TOBACCO MOSAIC-VIRUS [J].
ALTSCHUH, D ;
LESK, AM ;
BLOOMER, AC ;
KLUG, A .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :693-707
[3]   Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis [J].
Atchley, WR ;
Wollenberg, KR ;
Fitch, WM ;
Terhalle, W ;
Dress, AW .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :164-178
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments [J].
Burger, Lukas ;
van Nimwegen, Erik .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (01)
[6]   Crystal structure of Escherichia coli σE with the cytoplasmic domain of its anti-σ RseA [J].
Campbell, EA ;
Tupy, JL ;
Gruber, TM ;
Wang, S ;
Sharp, MM ;
Gross, CA ;
Darst, SA .
MOLECULAR CELL, 2003, 11 (04) :1067-1078
[7]   Determination of network of residues that regulate allostery in protein families using sequence analysis [J].
Dima, RI ;
Thirumalai, D .
PROTEIN SCIENCE, 2006, 15 (02) :258-268
[8]   Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction [J].
Dunn, S. D. ;
Wahl, L. M. ;
Gloor, G. B. .
BIOINFORMATICS, 2008, 24 (03) :333-340
[9]  
Durbin R, 1998, BIOL SEQUENCE ANAL P, P319
[10]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763