Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization

被引:159
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
submitochondria localization; Chou's pseudo amino acid; genetic programming;
D O I
10.1007/s00726-007-0018-1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.
引用
收藏
页码:653 / 660
页数:8
相关论文
共 75 条
[61]   Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition [J].
Shi, J.-Y. ;
Zhang, S.-W. ;
Pan, Q. ;
Cheng, Y.-M. ;
Xie, J. .
AMINO ACIDS, 2007, 33 (01) :69-74
[62]   Prediction of protein structural classes using support vector machines [J].
Sun, X. -D. ;
Huang, R. -B. .
AMINO ACIDS, 2006, 30 (04) :469-475
[63]   Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine [J].
Tan, F. ;
Feng, X. ;
Fang, Z. ;
Li, M. ;
Guo, Y. ;
Jiang, L. .
AMINO ACIDS, 2007, 33 (04) :669-675
[64]   Using string kernel to predict signal peptide cleavage site based on subsite coupling model [J].
Wang, M ;
Yang, J ;
Chou, KC .
AMINO ACIDS, 2005, 28 (04) :395-402
[65]   Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition [J].
Wang, M ;
Yang, J ;
Liu, GP ;
Xu, ZJ ;
Chou, KC .
PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (06) :509-516
[66]   Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition [J].
Wen, Z. ;
Li, M. ;
Li, Y. ;
Guo, Y. ;
Wang, K. .
AMINO ACIDS, 2007, 32 (02) :277-283
[67]   Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor [J].
Xiao, X ;
Shao, SH ;
Huang, ZD ;
Chou, KC .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2006, 27 (04) :478-482
[68]  
Xiao X, 2006, AMINO ACIDS, V30, P49, DOI 10.1007/s00726-005-0225-6
[69]   Using complexity measure factor to predict protein subcellular location [J].
Xiao, X ;
Shao, S ;
Ding, Y ;
Huang, Z ;
Huang, Y ;
Chou, KC .
AMINO ACIDS, 2005, 28 (01) :57-61
[70]   Digital coding of amino acids based on hydrophobic index [J].
Xiao, Xuan ;
Chou, Kuo-Chen .
PROTEIN AND PEPTIDE LETTERS, 2007, 14 (09) :871-875