Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition

被引:174
作者
Wang, M [1 ]
Yang, J
Liu, GP
Xu, ZJ
Chou, KC
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & PAttern Recognit, Shanghai 200030, Peoples R China
[2] Donghau Univ, Bioinformat Res Ctr, Shanghai 200050, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200030, Peoples R China
[4] Gordon Life Sci Inst, San Diego, CA 92130 USA
[5] Tianjin Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
关键词
Chou's invariance theorem; covariant discriminant algorithm; pseudo-amino acid composition; spectral analysis; weighted upsilon-SVM;
D O I
10.1093/protein/gzh061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Membrane proteins are generally classified into the following five types: (1) type I membrane proteins, (2) type II membrane proteins, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins and (5) GPI-anchored membrane proteins. Prediction of membrane protein types has become one of the growing hot topics in bioinformatics. Currently, we are facing two critical challenges in this area: first, how to take into account the extremely complicated sequence-order effects, and second, how to deal with the highly uneven sizes of the subsets in a training dataset. In this paper, stimulated by the concept of using the pseudo-amino acid composition to incorporate the sequence-order effects, the spectral analysis technique is introduced to represent the statistical sample of a protein. Based on such a framework, the weighted support vector machine (SVM) algorithm is applied. The new approach has remarkable power in dealing with the bias caused by the situation when one subset in the training dataset contains many more samples than the other. The new method is particularly useful when our focus is aimed at proteins belonging to small subsets. The results obtained by the self-consistency test, jackknife test and independent dataset test are encouraging, indicating that the current approach may serve as a powerful complementary tool to other existing methods for predicting the types of membrane proteins.
引用
收藏
页码:509 / 516
页数:8
相关论文
共 53 条
[31]  
Cristianini N., 2000, SUPPORT VECTOR MACHI, DOI DOI 10.1017/CBO9780511801389
[32]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[33]  
[范昕炜 Fan Xinwei], 2003, [中国图象图形学报. A, Journal of image and graphics], V8, P1037
[34]  
GUO ZM, 2002, THESIS SHANGHAI JIAO
[35]   A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach [J].
Hua, SJ ;
Sun, ZR .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 308 (02) :397-407
[36]  
Karush W., 1939, THESIS U CHICAGO
[37]  
Lee Y. Jye, 2001, RSVM REDUCED SUPPORT, P00
[38]   Fuzzy support vector machines [J].
Lin, CF ;
Wang, SD .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :464-471
[39]   Prediction of protein secondary structure content [J].
Liu, WM ;
Chou, KC .
PROTEIN ENGINEERING, 1999, 12 (12) :1041-1050
[40]  
Mahalanobis PC., 1936, P NATL I SCI INDIA, V12, P49, DOI DOI 10.1007/S13171-019-00164-5