Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions

被引:737
作者
Yu, CS
Lin, CJ [1 ]
Hwang, JK
机构
[1] Natl Chiao Tung Univ, Dept Biol Sci & Technol, Hsinchu 30050, Taiwan
[2] Natl Chiao Tung Univ, Inst Bioinformat, Hsinchu 30050, Taiwan
[3] Natl Taiwan Univ, Dept Comp Sci, Taipei 10617, Taiwan
关键词
subcellular localization; support vector machine; Gram-negative bacteria; machine-learning method; proteome; genome; n-peptide compositions;
D O I
10.1110/ps.03479604
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.
引用
收藏
页码:1402 / 1406
页数:5
相关论文
共 27 条
[1]   Adaptation of protein surfaces to subcellular location [J].
Andrade, MA ;
O'Donoghue, SI ;
Rost, B .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (02) :517-525
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]   Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2002, 84 (02) :343-348
[4]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[7]   Using functional domain composition and support vector machines for prediction of protein subcellular location [J].
Chou, KC ;
Cai, YD .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (48) :45765-45769
[8]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[9]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[10]   Evaluation of simple performance measures for tuning SVM hyperparameters [J].
Duan, K ;
Keerthi, SS ;
Poo, AN .
NEUROCOMPUTING, 2003, 51 :41-59