Combining prediction of secondary structure and solvent accessibility in proteins

被引:240
作者
Adamczak, R
Porollo, A
Meller, J
机构
[1] Childrens Hosp Res Fdn, Cincinnati, OH 45229 USA
[2] Nicholas Copernicus Univ, Dept Informat, Torun, Poland
关键词
secondary structure; neural networks; classification; protein structure prediction; relative solvent accessibility; SABLE;
D O I
10.1002/prot.20441
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-theart protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchme.org. (c) 2005 Wiley-Liss, Inc.
引用
收藏
页码:467 / 475
页数:9
相关论文
共 41 条
  • [1] Accurate prediction of solvent accessibility using neural networks-based regression
    Adamczak, R
    Porollo, A
    Meller, J
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) : 753 - 767
  • [2] NETASA: neural network based prediction of solvent accessibility
    Ahmad, S
    Gromiha, MM
    [J]. BIOINFORMATICS, 2002, 18 (06) : 819 - 824
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] Continuum secondary structure captures protein flexibility
    Anderson, CAF
    Palmer, AG
    Brunak, S
    Rost, B
    [J]. STRUCTURE, 2002, 10 (02) : 175 - 184
  • [5] [Anonymous], SNNS USERS MANUAL VE
  • [6] Exploiting the past and the future in protein secondary structure prediction
    Baldi, P
    Brunak, S
    Frasconi, P
    Soda, G
    Pollastri, G
    [J]. BIOINFORMATICS, 1999, 15 (11) : 937 - 946
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
  • [8] Benson DA, 2003, NUCLEIC ACIDS RES, V31, P23, DOI 10.1093/nar/gkg057
  • [9] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [10] HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins
    Bystroff, C
    Thorsson, V
    Baker, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) : 173 - 190