Building a Knowledge-Based Statistical Potential by Capturing High-Order Inter-residue Interactions and its Applications in Protein Secondary Structure Assessment

被引:9
作者
Li, Yaohang [1 ]
Liu, Hui [2 ]
Rata, Ionel [5 ]
Jakobsson, Eric [3 ,4 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Univ Illinois, Ctr Biophys & Computat Biol, Urbana, IL 61801 USA
[3] Univ Illinois, Dept Mol & Integrat Physiol, Beckman Inst, Urbana, IL 61801 USA
[4] Univ Illinois, Natl Ctr Supercomp Applicat, Urbana, IL 61801 USA
[5] Natl Inst Phys & Nucl Engn IFIN HH, R-77125 Bucharest, Romania
基金
美国国家科学基金会;
关键词
STRUCTURE PREDICTION; EVOLUTIONARY INFORMATION; RANGE INTERACTIONS; NEURAL-NETWORKS;
D O I
10.1021/ci300207x
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-position-specific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets. Doublet Triplet Quartet
引用
收藏
页码:500 / 508
页数:9
相关论文
共 32 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCRATCH: a protein structure and structural feature prediction server [J].
Cheng, J ;
Randall, AZ ;
Sweredoski, MJ ;
Baldi, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W72-W76
[3]  
Cole C., 2008, NUCLEIC ACIDS RES, V1, P36
[4]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[5]  
2-4
[6]   Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training [J].
Dor, Ofer ;
Zhou, Yaoqi .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 66 (04) :838-845
[7]   Knowledge-based protein secondary structure assignment [J].
Frishman, D ;
Argos, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 23 (04) :566-579
[8]  
Garnier J, 1996, METHOD ENZYMOL, V266, P540
[9]  
GUERMEUR Y, 1997, THESIS U PARIS 6
[10]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202