Discrimination of mesophilic and thermophilic proteins using machine learning algorithms

被引:76
作者
Gromiha, M. Michael [1 ]
Suresh, M. Xavier [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Koto Ku, Tokyo 1350064, Japan
关键词
D O I
10.1002/prot.21616
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Discriminating thermophilic proteins from their mesophilic counterparts is a challenging task and it would help to design stable proteins. In this work, we have systematically analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins belonging to 9 and 15 families, respectively. We found that the charged residues Lys, Arg, and Glu as well as the hydrophobic residues, Val and Ile have higher occurrence in thermophiles than mesophiles. Further, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees and so forth for discriminating mesophilic and thermophilic proteins. We found that most of the machine learning techniques discriminate these classes of proteins with similar accuracy. The neural network-based method could discriminate the thermophiles from mesophiles at the five-fold cross-validation accuracy of 89% in a dataset of 4684 proteins. Moreover, this method is tested with 325 mesophiles in Xylella fastidosa and 382 thermophiles in Aquifex aeolicus and it could successfully discriminate them with the accuracy of 91%. These accuracy levels are better than other methods in the literature and we suggest that this method could be effectively used to discriminate mesophilic and thermophilic proteins.
引用
收藏
页码:1274 / 1279
页数:6
相关论文
共 37 条
[1]
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[2]
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[3]
[Anonymous], SILICO BIOL
[4]
Positive and negative design in stability and thermal adaptation of natural proteins [J].
Berezovsky, Igor N. ;
Zeldovich, Konstantin B. ;
Shakhnovich, Eugene I. .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (03) :498-507
[5]
Kernel-based machine learning protocol for predicting DNA-binding proteins [J].
Bhardwaj, N ;
Langlois, RE ;
Zhao, GJ ;
Lu, H .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493
[6]
Elucidation of factors responsible for enhanced thermal stability of proteins: A structural genomics based study [J].
Chakravarty, S ;
Varadarajan, R .
BIOCHEMISTRY, 2002, 41 (25) :8152-8161
[7]
Das Rajdeep, 2000, Functional and Integrative Genomics, V1, P76, DOI 10.1007/s101420050009
[8]
The influence of dipeptide composition on protein thermostability [J].
Ding, YR ;
Cai, YJ ;
Zhang, GX ;
Xu, WB .
FEBS LETTERS, 2004, 569 (1-3) :284-288
[9]
An electrostatic basis for the stability of thermophilic proteins [J].
Dominy, BN ;
Minoux, H ;
Brooks, CL .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 57 (01) :128-141
[10]
Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria [J].
Fukuchi, S ;
Nishikawa, K .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 309 (04) :835-843