Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition

被引:174
作者
Wang, M [1 ]
Yang, J
Liu, GP
Xu, ZJ
Chou, KC
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & PAttern Recognit, Shanghai 200030, Peoples R China
[2] Donghau Univ, Bioinformat Res Ctr, Shanghai 200050, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200030, Peoples R China
[4] Gordon Life Sci Inst, San Diego, CA 92130 USA
[5] Tianjin Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
关键词
Chou's invariance theorem; covariant discriminant algorithm; pseudo-amino acid composition; spectral analysis; weighted upsilon-SVM;
D O I
10.1093/protein/gzh061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Membrane proteins are generally classified into the following five types: (1) type I membrane proteins, (2) type II membrane proteins, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins and (5) GPI-anchored membrane proteins. Prediction of membrane protein types has become one of the growing hot topics in bioinformatics. Currently, we are facing two critical challenges in this area: first, how to take into account the extremely complicated sequence-order effects, and second, how to deal with the highly uneven sizes of the subsets in a training dataset. In this paper, stimulated by the concept of using the pseudo-amino acid composition to incorporate the sequence-order effects, the spectral analysis technique is introduced to represent the statistical sample of a protein. Based on such a framework, the weighted support vector machine (SVM) algorithm is applied. The new approach has remarkable power in dealing with the bias caused by the situation when one subset in the training dataset contains many more samples than the other. The new method is particularly useful when our focus is aimed at proteins belonging to small subsets. The results obtained by the self-consistency test, jackknife test and independent dataset test are encouraging, indicating that the current approach may serve as a powerful complementary tool to other existing methods for predicting the types of membrane proteins.
引用
收藏
页码:509 / 516
页数:8
相关论文
共 53 条
[1]  
[Anonymous], 1990, SINGLE LAYER LEARNIN
[2]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[3]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[4]   Support vector machines for prediction of protein signal sequences and their cleavage sites [J].
Cai, YD ;
Lin, SL ;
Chou, KC .
PEPTIDES, 2003, 24 (01) :159-161
[5]   Support vector machines for predicting the specificity of GaINAc-transferase [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
PEPTIDES, 2002, 23 (01) :205-208
[6]   Prediction of protein structural classes by support vector machines [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
COMPUTERS & CHEMISTRY, 2002, 26 (03) :293-296
[7]   Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2002, 84 (02) :343-348
[8]   Is it a paradox or misinterpretation? [J].
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :336-338
[9]  
CAI YD, 2002, INTERNET ELECT J MOL, V1, P219
[10]  
Cai Yu-Dong, 2000, Molecular Cell Biology Research Communications, V4, P230, DOI 10.1006/mcbr.2001.0285