A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction

被引:31
作者
Eyal, Eran
Frenkel-Morgenstern, Milana
Sobolev, Vladimir
Pietrokovski, Shmuel
机构
[1] Weizmann Inst Sci, Dept Plant Sci, IL-76100 Rehovot, Israel
[2] Weizmann Inst Sci, Dept Mol Genet, IL-76100 Rehovot, Israel
关键词
contact prediction; correlated mutations; ab initio;
D O I
10.1002/prot.21223
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complimenting traditional scoring functions. Proteins 2007;67:142-153. (c) 2007 Wiley-Liss, Inc.
引用
收藏
页码:142 / 153
页数:12
相关论文
共 50 条
  • [1] Accurate prediction of solvent accessibility using neural networks-based regression
    Adamczak, R
    Porollo, A
    Meller, J
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) : 753 - 767
  • [2] Predictions without templates: New folds, secondary structure, and contacts in CASP5
    Aloy, P
    Stark, A
    Hadley, S
    Russell, RB
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 : 436 - 456
  • [3] CORRELATION OF COORDINATED AMINO-ACID SUBSTITUTIONS WITH FUNCTION IN VIRUSES RELATED TO TOBACCO MOSAIC-VIRUS
    ALTSCHUH, D
    LESK, AM
    BLOOMER, AC
    KLUG, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) : 693 - 707
  • [4] Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]
  • [5] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [6] Protein fold determination from sparse distance restraints: The Restrained Generic Protein Direct Monte Carlo method
    Debe, DA
    Carlson, MJ
    Sadanobu, J
    Chan, SI
    Goddard, WA
    [J]. JOURNAL OF PHYSICAL CHEMISTRY B, 1999, 103 (15): : 3001 - 3008
  • [7] The HSSP database of protein structure sequence alignments and family profiles
    Dodge, C
    Schneider, R
    Sander, C
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 313 - 315
  • [8] Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations
    Fariselli, P
    Olmea, O
    Valencia, A
    Casadio, R
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2001, : 157 - 162
  • [9] Prediction of contact maps with neural networks and correlated mutations
    Fariselli, P
    Olmea, O
    Valencia, A
    Casadio, R
    [J]. PROTEIN ENGINEERING, 2001, 14 (11): : 835 - 843
  • [10] Finkelstein AV, 1999, BIOFIZIKA+, V44, P980