Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition

被引:174
作者
Wang, M [1 ]
Yang, J
Liu, GP
Xu, ZJ
Chou, KC
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & PAttern Recognit, Shanghai 200030, Peoples R China
[2] Donghau Univ, Bioinformat Res Ctr, Shanghai 200050, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200030, Peoples R China
[4] Gordon Life Sci Inst, San Diego, CA 92130 USA
[5] Tianjin Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
关键词
Chou's invariance theorem; covariant discriminant algorithm; pseudo-amino acid composition; spectral analysis; weighted upsilon-SVM;
D O I
10.1093/protein/gzh061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Membrane proteins are generally classified into the following five types: (1) type I membrane proteins, (2) type II membrane proteins, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins and (5) GPI-anchored membrane proteins. Prediction of membrane protein types has become one of the growing hot topics in bioinformatics. Currently, we are facing two critical challenges in this area: first, how to take into account the extremely complicated sequence-order effects, and second, how to deal with the highly uneven sizes of the subsets in a training dataset. In this paper, stimulated by the concept of using the pseudo-amino acid composition to incorporate the sequence-order effects, the spectral analysis technique is introduced to represent the statistical sample of a protein. Based on such a framework, the weighted support vector machine (SVM) algorithm is applied. The new approach has remarkable power in dealing with the bias caused by the situation when one subset in the training dataset contains many more samples than the other. The new method is particularly useful when our focus is aimed at proteins belonging to small subsets. The results obtained by the self-consistency test, jackknife test and independent dataset test are encouraging, indicating that the current approach may serve as a powerful complementary tool to other existing methods for predicting the types of membrane proteins.
引用
收藏
页码:509 / 516
页数:8
相关论文
共 53 条
[21]   Using functional domain composition and support vector machines for prediction of protein subcellular location [J].
Chou, KC ;
Cai, YD .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (48) :45765-45769
[23]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349
[24]   A NOVEL-APPROACH TO PREDICTING PROTEIN STRUCTURAL CLASSES IN A (20-1)-D AMINO-ACID-COMPOSITION SPACE [J].
CHOU, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 21 (04) :319-344
[25]  
CHOU KC, 1994, J BIOL CHEM, V269, P22014
[26]   Prediction of protein cellular attributes using pseudo-amino acid composition (vol 43, pg 246, 2001) [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 44 (01) :60-60
[27]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[28]   Using pair-coupled amino acid composition to predict protein secondary structure content [J].
Chou, KC .
JOURNAL OF PROTEIN CHEMISTRY, 1999, 18 (04) :473-480
[29]  
CHOU KC, 2002, GENE CLONING EXPRESS, P57
[30]  
CHOU PY, 1980, 2 CHEM C N AM CONT 1