Ensemble classifier for protein fold pattern recognition

被引:329
作者
Shen, Hong-Bin [1 ]
Chou, Kuo-Chen
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200030, Peoples R China
[2] Gordon Life Sci Inst, San Diego, CA 92130 USA
关键词
D O I
10.1093/bioinformatics/btl170
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from attaining dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. Results: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have < 25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics.
引用
收藏
页码:1717 / 1722
页数:6
相关论文
共 36 条
[1]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[2]   Is it a paradox or misinterpretation? [J].
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :336-338
[3]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262
[4]  
CHOU KC, 1984, J AM CHEM SOC, V106, P3161, DOI 10.1021/ja00323a017
[5]   STRUCTURE OF BETA-SHEETS - ORIGIN OF THE RIGHT-HANDED TWIST AND OF THE INCREASED STABILITY OF ANTIPARALLEL OVER PARALLEL SHEETS [J].
CHOU, KC ;
POTTLE, M ;
NEMETHY, G ;
UEDA, Y ;
SCHERAGA, HA .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (01) :89-112
[6]   Prediction of membrane protein types by incorporating amphipathic effects [J].
Chou, KC ;
Cai, YD .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (02) :407-413
[7]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[8]  
Chou KC, 1997, PROTEINS, V28, P99, DOI 10.1002/(SICI)1097-0134(199705)28:1<99::AID-PROT10>3.3.CO
[9]  
2-1
[10]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134