Is There an Optimal Substitution Matrix for Contact Prediction with Correlated Mutations?

被引:5
作者
Di Lena, Pietro [1 ]
Fariselli, Piero [2 ]
Margara, Luciano [1 ]
Vassura, Marco [1 ]
Casadio, Rita [2 ]
机构
[1] Univ Bologna, Dept Comp Sci, I-40127 Bologna, Italy
[2] Univ Bologna, Dept Biol, Biocomp Grp, I-40127 Bologna, Italy
关键词
Protein contact prediction; correlated mutations; similarity matrix;
D O I
10.1109/TCBB.2010.91
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.
引用
收藏
页码:1017 / 1028
页数:12
相关论文
共 24 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]   Reducing phylogenetic bias in correlated mutation analysis [J].
Ashkenazy, Haim ;
Kliger, Yossef .
PROTEIN ENGINEERING DESIGN & SELECTION, 2010, 23 (05) :321-326
[4]   Optimal data collection for correlated mutation analysis [J].
Ashkenazy, Haim ;
Unger, Ron ;
Kliger, Yossef .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 74 (03) :545-555
[5]   The effect of backbone on the small-world properties of protein contact maps [J].
Bartoli, L. ;
Fariselli, P. ;
Casadio, R. .
PHYSICAL BIOLOGY, 2007, 4 (04) :L1-L5
[6]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   Macromolecular modeling with Rosetta [J].
Das, Rhiju ;
Baker, David .
ANNUAL REVIEW OF BIOCHEMISTRY, 2008, 77 :363-382
[8]  
Dayhoff M O., 1978, Atlas of Protein Seq Struct, ppp 345
[9]   Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 [J].
Ezkurdia, Iakes ;
Grana, Osvaldo ;
Izarzugaza, Jose M. G. ;
Tress, Michael L. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 :196-209
[10]   CORRELATED MUTATIONS AND RESIDUE CONTACTS IN PROTEINS [J].
GOBEL, U ;
SANDER, C ;
SCHNEIDER, R ;
VALENCIA, A .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1994, 18 (04) :309-317