Ranking predicted protein structures with support vector regression

被引:65
作者
Qiu, Jian [1 ]
Sheffler, Will [1 ]
Baker, David [1 ,2 ]
Noble, William Stafford [1 ,3 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Dept Biochem, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
protein structure prediction; scoring function; support vector regression; machine learning; consensus-based feature;
D O I
10.1002/prot.21809
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein structure prediction is an important problem of both intellectual and practical interest. Most protein structure prediction approaches generate multiple candidate models first, and then use a scoring function to select the best model among these candidates. In this work, we develop a scoring function using support vector regression (SVR). Both consensus-based features and features from individual structures are extracted from a training data set containing native protein structures and predicted structural models submitted to CASP5 and CASP6. The SVR learns a scoring function that is a linear combination of these features. We test this scoring function on two data sets. First, when used to rank server models submitted to CASP7, the SVR score selects predictions that are comparable to the best performing server in CASP7, Zhang-Server, and significantly better than all the other servers. Even if the SVR score is not allowed to select Zhang-Server models, the SVR score still selects predictions that are significantly better than all the other servers. In addition, the SVR is able to select significantly better models and yield significantly better Pearson correlation coefficients than the two best Quality Assessment groups in CASP7, QA556 (LEE), and QA634 (Pcons). Second, this work aims to improve the ability of the Robetta server to select best models, and hence we evaluate the performance of the SVR score on ranking the Robetta server template-based models for the CASP7 targets. The SVR selects significantly better models than the Robetta K*Sync consensus alignment score.
引用
收藏
页码:1175 / 1182
页数:8
相关论文
共 42 条
[1]  
Andersen E. D., 2000, HIGH PERFORMANCE OPT, P197, DOI DOI 10.1007/978-1-4757-3216-0_8
[2]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[3]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[4]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[5]   CHARMM - A PROGRAM FOR MACROMOLECULAR ENERGY, MINIMIZATION, AND DYNAMICS CALCULATIONS [J].
BROOKS, BR ;
BRUCCOLERI, RE ;
OLAFSON, BD ;
STATES, DJ ;
SWAMINATHAN, S ;
KARPLUS, M .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1983, 4 (02) :187-217
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   Protein function classification via support vector machine approach [J].
Cai, CZ ;
Wang, WL ;
Sun, LZ ;
Chen, YZ .
MATHEMATICAL BIOSCIENCES, 2003, 185 (02) :111-122
[8]   Prediction of CASP6 structures using automated Robetta protocols [J].
Chivian, D ;
Kim, DE ;
Malmström, L ;
Schonbrun, J ;
Rohl, CA ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 :157-166
[9]   Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection [J].
Chivian, Dylan ;
Baker, David .
NUCLEIC ACIDS RESEARCH, 2006, 34 (17)
[10]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction