GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors

被引:137
作者
Bhasin, M [1 ]
Raghava, GPS [1 ]
机构
[1] Inst Microbial Technol, Chandigarh 160036, India
关键词
D O I
10.1093/nar/gkh416
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
G-protein coupled receptors (GPCRs) belong to one of the largest superfamilies of membrane proteins and are important targets for drug design. In this study, a support vector machine (SVM)-based method, GPCRpred, has been developed for predicting families and subfamilies of GPCRs from the dipeptide composition of proteins. The dataset used in this study for training and testing was obtained from http://www.soe.ucsc.edu/research/compbio/gpcr/. The method classified GPCRs and non-GPCRs with an accuracy of 99.5% when evaluated using 5-fold cross-validation. The method is further able to predict five major classes or families of GPCRs with an overall Matthew's correlation coefficient (MCC) and accuracy of 0.81 and 97.5% respectively. In recognizing the subfamilies of the rhodopsin-like family, the method achieved an average MCC and accuracy of 0.97 and 97.3% respectively. The method achieved overall accuracy of 91.3% and 96.4% at family and subfamily level respectively when evaluated on an independent/blind dataset of 650 GPCRs. A server for recognition and classification of GPCRs based on multiclass SVMs has been set up at http://www.imtech.res.in/raghava/gpcrpred/. We have also suggested subfamilies for 42 sequences which were previously identified as unclassified ClassA GPCRs. The supplementary information is available at http://www.imtech.res.in/raghava/gpcrpred/info.html.
引用
收藏
页码:W383 / W389
页数:7
相关论文
共 15 条
[1]   Deriving structural and functional insights from a ligand-based hierarchical classification of G protein-coupled receptors [J].
Attwood, TK ;
Croning, MDR ;
Gaulton, A .
PROTEIN ENGINEERING, 2002, 15 (01) :7-12
[2]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[3]  
BHASIN M, 2004, IN PRESS J BIOL CHEM
[4]   PROSET - A FAST PROCEDURE TO CREATE NONREDUNDANT SETS OF PROTEIN SEQUENCES [J].
BRENDEL, V .
MATHEMATICAL AND COMPUTER MODELLING, 1992, 16 (6-7) :37-43
[5]   A study on the correlation of G-protein-coupled receptor types with amino acid composition [J].
Elrod, DW ;
Chou, KC .
PROTEIN ENGINEERING, 2002, 15 (09) :713-715
[6]  
GRASSMANN J, 1909, P 7 INT C INT SYST M, P106
[7]   Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems [J].
Horn, F ;
Vriend, G ;
Cohen, FE .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :346-349
[8]   Support vector machine approach for protein subcellular localization prediction [J].
Hua, SJ ;
Sun, ZR .
BIOINFORMATICS, 2001, 17 (08) :721-728
[9]   Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern [J].
Inoue, Y ;
Ikeda, M ;
Shimizu, J .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (01) :39-49
[10]   A discriminative framework for detecting remote protein homologies [J].
Jaakkola, T ;
Diekhans, M ;
Haussler, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :95-114