Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training

被引：102

作者：

Dor, Ofer

Zhou, Yaoqi ^{[1
]}

机构：

[1] Indiana Univ Purdue Univ, Sch Informat, Indianapolis, IN 46202 USA

[2] SUNY Buffalo, Howard Hughes Med Inst, Ctr Single Mol Biophys, Dept Physiol & Biophys, Buffalo, NY 14214 USA

[3] Indiana Univ, Sch Med, Ctr Computat Biol & Bioinformat, Indianapolis, IN 46202 USA

来源：

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS | 2007年 / 66卷 / 04期

关键词：

solvent accessibility; solvent accessible surface area; neural network;

D O I：

10.1002/prot.21298

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA > 95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.

引用

页码：838 / 845

页数：8

共 64 条

[1] Combining prediction of secondary structure and solvent accessibility in proteins [J].