Using string kernel to predict signal peptide cleavage site based on subsite coupling model

被引:47
作者
Wang, M
Yang, J
Chou, KC
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200030, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
关键词
signal peptide; Chou's subsite coupling approach; probabilistic model; string kernels; support vector machine;
D O I
10.1007/s00726-005-0189-6
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Owing to the importance of signal peptides for studying the molecular mechanisms of genetic diseases, reprogramming cells for gene therapy, and finding new drugs for healing a specific defect, it is in great demand to develop a fast and accurate method to identify the signal peptides. Introduction of the so-called {-3, -1, +1} coupling model (Chou, K. C.: Protein Engineering, 2001, 14 - 2, 75 - 79) has made it possible to take into account the coupling effect among some key subsites and hence can significantly enhance the prediction quality of peptide cleavage site. Based on the subsite coupling model, a kind of string kernels for protein sequence is introduced. Integrating the biologically relevant prior knowledge, the constructed string kernels can thus be used by any kernel-based method. A Support vector machines (SVM) is thus built to predict the cleavage site of signal peptides from the protein sequences. The current approach is compared with the classical weight matrix method. At small false positive ratios, our method outperforms the classical weight matrix method, indicating the current approach may at least serve as a powerful complemental tool to other existing methods for predicting the signal peptide cleavage site. The software that generated the results reported in this paper is available upon requirement, and will appear at http://www.pami.sjtu.edu.cn/wm.
引用
收藏
页码:395 / 402
页数:8
相关论文
共 31 条
[1]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[2]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[3]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[4]   Using GO-PseAA predictor to predict enzyme sub-class [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 325 (02) :506-509
[5]   Predicting enzyme family class in a hybridization space [J].
Chou, KC ;
Cai, YD .
PROTEIN SCIENCE, 2004, 13 (11) :2857-2863
[6]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134
[7]   A novel approach to predict active sites of enzyme molecules [J].
Chou, KC ;
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :77-82
[8]   Prediction of human immunodeficiency virus protease cleavage sites in proteins [J].
Chou, KC .
ANALYTICAL BIOCHEMISTRY, 1996, 233 (01) :1-14
[9]   A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 311 (03) :743-747
[10]   Prediction of protein signal sequences [J].
Chou, KC .
CURRENT PROTEIN & PEPTIDE SCIENCE, 2002, 3 (06) :615-622