Prediction of coordination number and relative solvent accessibility in proteins

被引:182
作者
Pollastri, G
Baldi, P [1 ]
Fariselli, P
Casadio, R
机构
[1] Univ Calif Irvine, Inst Genom & Bioinformat, Dept Informat & Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Coll Med, Dept Biol Chem, Irvine, CA USA
[3] Univ Bologna, Dept Biol, CIRB Biocomp Unit, I-40126 Bologna, Italy
[4] Univ Bologna, Biophys Lab, I-40126 Bologna, Italy
关键词
protein structure prediction; protein contacts; contact map; contact number; recurrent neural networks; evolutionary information;
D O I
10.1002/prot.10069
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowing the coordination number and relative solvent accessibility of all the residues in a protein is crucial for deriving constraints useful in modeling protein folding and protein structure and in scoring remote homology searches. We develop ensembles of bidirectional recurrent neural network architectures to improve the state of the art in both contact and accessibility prediction, leveraging a large corpus of curated data together with evolutionary information. The ensembles are used to discriminate between two different states of residue contacts or relative solvent accessibility, higher or lower than a threshold determined by the average value of the residue distribution or the accessibility cutoff. For coordination numbers, the ensemble achieves performances ranging within 70.6-73.9% depending on the radius adopted to discriminate contacts (6Angstrom-12Angstrom). These performances represent gains of 16-20% over the baseline statistical predictor, always assigning an amino acid to the largest class, and are 4-7% better than any previous method. A combination of different radius predictors further improves performance. For accessibility thresholds in the relevant 15-30% range, the ensemble consistently achieves a performance above 77%, which is 10-16% above the baseline prediction and better than other existing predictors, by up to several percentage points. For both problems, we quantify the improvement due to evolutionary information in the form of PSI-BLAST-generated profiles over BLAST profiles. The prediction programs are implemented in the form of two web servers, CON pro and ACCpro, available at http://promoter.ics. uci.edu/BRNN-PRED/.
引用
收藏
页码:142 / 153
页数:12
相关论文
共 47 条
[1]   Do aligned sequences share the same fold? [J].
Abagyan, RA ;
Batalov, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) :355-368
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]   GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS [J].
ASZODI, A ;
GRADWELL, MJ ;
TAYLOR, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) :308-326
[5]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[6]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[7]  
Baldi P, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P25
[8]  
Baldi P, 2001, BIOINFORMATICS MACHI
[9]  
BALDI P, 2001, IN PRESS INTELLIGENT
[10]  
BALDI P, 2000, SEQUENCE LEARNING PA, P99