MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins

被引:246
作者
Jones, David T. [1 ]
Singh, Tanya [1 ]
Kosciolek, Tomasz [1 ]
Tetchner, Stuart [1 ]
机构
[1] UCL, Dept Comp Sci, Bioinformat Grp, London WC1E 6BT, England
基金
英国惠康基金; 英国生物技术与生命科学研究理事会;
关键词
RESIDUE CONTACTS; SUBSTITUTIONS; INFORMATION; ALIGNMENTS; MUTATION;
D O I
10.1093/bioinformatics/btu791
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues. Results: Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV.
引用
收藏
页码:999 / 1006
页数:8
相关论文
共 29 条
[1]   CORRELATION OF COORDINATED AMINO-ACID SUBSTITUTIONS WITH FUNCTION IN VIRUSES RELATED TO TOBACCO MOSAIC-VIRUS [J].
ALTSCHUH, D ;
LESK, AM ;
BLOOMER, AC ;
KLUG, A .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :693-707
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]  
Betancourt MR, 1999, PROTEIN SCI, V8, P361
[4]   Improved residue contact prediction using support vector machines and a large feature set [J].
Cheng, Jianlin ;
Baldi, Pierre .
BMC BIOINFORMATICS, 2007, 8 (1)
[5]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[6]   The Protein-Folding Problem, 50 Years On [J].
Dill, Ken A. ;
MacCallum, Justin L. .
SCIENCE, 2012, 338 (6110) :1042-1046
[7]   Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction [J].
Dunn, S. D. ;
Wahl, L. M. ;
Gloor, G. B. .
BIOINFORMATICS, 2008, 24 (03) :333-340
[8]   Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 [J].
Ezkurdia, Iakes ;
Grana, Osvaldo ;
Izarzugaza, Jose M. G. ;
Tress, Michael L. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 :196-209
[9]   Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing [J].
Hopf, Thomas A. ;
Colwell, Lucy J. ;
Sheridan, Robert ;
Rost, Burkhard ;
Sander, Chris ;
Marks, Debora S. .
CELL, 2012, 149 (07) :1607-1621
[10]   PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments [J].
Jones, David T. ;
Buchan, Daniel W. A. ;
Cozzetto, Domenico ;
Pontil, Massimiliano .
BIOINFORMATICS, 2012, 28 (02) :184-190