An expert system to predict protein thermostability using decision tree

被引:40
作者
Wu, Li-Cheng [2 ]
Lee, Jian-Xin [1 ]
Huang, Hsien-Da [3 ]
Liu, Baw-Juine [4 ]
Horng, Jorng-Tzong [1 ,2 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[2] Natl Cent Univ, Inst Syst Biol & Bioinformat, Taipei, Taiwan
[3] Natl Chiao Tung Univ, Inst Bioinformat, Hsinchu, Taiwan
[4] Yuan Ze Univ, Jhongli, Taiwan
关键词
Expert system; Machine learning; Bioinformatics; Protein thermostability; Decision Tree; THERMAL-STABILITY; INSIGHTS;
D O I
10.1016/j.eswa.2008.12.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Protein thermostability information is closely linked to commercial production of many biomaterials. Recent developments have shown that amino acid composition, special sequence patterns and hydrogen bonds, disulfide bonds, salt bridges and so on are of considerable importance to thermostability. In this study, we present a system to integrate these various factors that predict protein thermostability. In this study, the features of proteins in the PGTdb are analyzed. We consider both structure and sequence features and correlation coefficients are incorporated into the feature selection algorithm. Machine learning algorithms are then used to develop identification systems and performances between the different algorithms are compared. In this research, two features, (E + F + M + R)/residue and charged/non-charged, are found to be critical to the thermostability of proteins. Although the sequence and structural models achieve a higher accuracy, sequence-only models provides sufficient accuracy for sequence-only thermostability prediction. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9007 / 9014
页数:8
相关论文
共 27 条
[1]  
[Anonymous], 2005, Data Mining: Concepts and Techniques
[2]  
[Anonymous], 1996, Mathematical Statistics With Applications, DOI DOI 10.1080/00401706.1987.10488256
[3]   Supervised machine learning techniques for the classification of metabolic disorders in newborns [J].
Baumgartner, C ;
Böhm, C ;
Baumgartner, D ;
Marini, G ;
Weinberger, K ;
Olgemöller, B ;
Liebl, B ;
Roscher, AA .
BIOINFORMATICS, 2004, 20 (17) :2985-2996
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure [J].
Capriotti, E ;
Fariselli, P ;
Casadio, R .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W306-W310
[6]   A neural-network-based method for predicting protein stability changes upon single point mutations [J].
Capriotti, Emidio ;
Fariselli, Piero ;
Casadio, Rita .
BIOINFORMATICS, 2004, 20 :63-68
[7]   Relationship between local structural entropy and protein thermostability [J].
Chan, CH ;
Liang, HK ;
Hsiao, NW ;
Ko, MT ;
Lyu, PC ;
Hwang, JK .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (04) :684-691
[8]   Calculation of helix packing angles in protein structures [J].
Dalton, JAR ;
Michalopoulos, I ;
Westhead, DR .
BIOINFORMATICS, 2003, 19 (10) :1298-1299
[9]   An electrostatic basis for the stability of thermophilic proteins [J].
Dominy, BN ;
Minoux, H ;
Brooks, CL .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (01) :128-141
[10]  
Farias Savio T, 2004, In Silico Biol, V4, P377