Prediction of protein structural class using novel evolutionary collocation-based sequence representation

被引:150
作者
Chen, Ke [1 ]
Kurgan, Lukasz A. [1 ]
Ruan, Jishou [2 ,3 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, ECERF, Edmonton, AB T6G 2V4, Canada
[2] Nankai Univ, LPMC, Tianjin 300071, Peoples R China
[3] Nankai Univ, Chern Inst Math, Coll Math Sci, Tianjin 300071, Peoples R China
关键词
protein structure; domain structural class; PSI-BLAST; collocation of AA pairs; evolutionary information; SCOP; support vector machine;
D O I
10.1002/jcc.20918
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html. (C) 2008 Wiley Periodicals, Inc.
引用
收藏
页码:1596 / 1604
页数:9
相关论文
共 54 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[4]   Prediction of protein (domain) structural classes based on amino-acid index [J].
Bu, WS ;
Feng, ZP ;
Zhang, ZD ;
Zhang, CT .
EUROPEAN JOURNAL OF BIOCHEMISTRY, 1999, 266 (03) :1043-1049
[5]  
CAI Y, 2002, BIOCHIMIE, V82, P783
[6]   Using LogitBoost classifier to predict protein structural classes [J].
Cai, YD ;
Feng, KY ;
Lu, WC ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (01) :172-176
[7]   Prediction of protein structural classes by support vector machines [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
COMPUTERS & CHEMISTRY, 2002, 26 (03) :293-296
[8]   Support Vector Machines for predicting protein structural class [J].
Cai, Yu-Dong ;
Liu, Xiao-Jun ;
Xu, Xue-biao ;
Zhou, Guo-Ping .
BMC BIOINFORMATICS, 2001, 2 (1)
[9]   Prediction of protein structural class with Rough Sets [J].
Cao, YF ;
Liu, S ;
Zhang, LD ;
Qin, J ;
Wang, J ;
Tang, KX .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]   A HEURISTIC APPROACH TO PREDICTING THE TERTIARY STRUCTURE OF BOVINE SOMATOTROPIN [J].
CARLACCI, L ;
CHOU, KC ;
MAGGIORA, GM .
BIOCHEMISTRY, 1991, 30 (18) :4389-4398