Prediction of transporter family from protein sequence by support vector machine approach

被引:56
作者
Lin, HH
Han, LY
Cai, CZ
Ji, ZL
Chen, YZ
机构
[1] Natl Univ Singapore, Dept Computat Sci, Bioinformat & Drug Design Grp, Singapore 117543, Singapore
[2] Xiamen Univ, Sch Life Sci, Key Lab Chem Biol Fujian Province, Xiamen, Peoples R China
关键词
channel; transporter; support vector machine;
D O I
10.1002/prot.20605
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequence similarity. This work explores the use of a machine learning method, support vector machines (SVMs), for predicting the family of transporters from their sequence without the use of sequence similarity. A total of 10,636 transporters in 13 TC subclasses, 1914 transporters in eight TC families, and 168,341 nontransporter proteins are used to train and test the SVM prediction system. Testing results by using a separate set of 4351 transporters and 83,151 nontransporter proteins show that the overall accuracy for predicting members of these TC subclasses and families is 83.4% and 88.0%, respectively, and that of nonmembers is 99.3% and 96.6%, respectively. The accuracies for predicting members and nonmembers of individual TC subclasses are in the range of 70.7-96.1% and 97.6-99.9%, respectively, and those of individual TC families are in the range of 60.6-97.1% and 91.5-99.4%, respectively. A further test by using 26,139 transmembrane proteins outside each of the 13 TC subclasses shows that 90.4-99.6% of these are correctly predicted. Our study suggests that the SVM is potentially useful for facilitating functional study of transporters irrespective of sequence similarity.
引用
收藏
页码:218 / 231
页数:14
相关论文
共 47 条
[1]   Guilt by association: Contextual information in genome analysis [J].
Aravind, L .
GENOME RESEARCH, 2000, 10 (08) :1074-1077
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]   Strategies to identify ion channel modulators: Current and novel approaches to target neuroplathic pain [J].
Birch, PJ ;
Dekker, LV ;
James, IF ;
Southan, A ;
Cronk, D .
DRUG DISCOVERY TODAY, 2004, 9 (09) :410-418
[6]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[7]   Mammalian ABC transporters in health and disease [J].
Borst, P ;
Elferink, RO .
ANNUAL REVIEW OF BIOCHEMISTRY, 2002, 71 :537-592
[8]   The Transporter Classification (TC) system, 2002 [J].
Busch, W ;
Saier, MH .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 2002, 37 (05) :287-337
[9]   Enzyme family classification by support vector machines [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, YZ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :66-76
[10]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697