Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search

被引:179
作者
Garg, A [1 ]
Bhasin, M [1 ]
Raghava, GPS [1 ]
机构
[1] Inst Microbial Technol, Bioinformat Ctr, Chandigarh, India
关键词
D O I
10.1074/jbc.M411789200
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Here we report a systematic approach for predicting subcellular localization ( cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i. e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide ( i + 1) and higher order dipeptide ( i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech. res. in/raghava/hslpred/ or bioinformatics. uams. edu/ raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.
引用
收藏
页码:14427 / 14432
页数:6
相关论文
共 25 条
[1]  
Alberts B, 1994, MOL BIOL CELL, P1255
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]   Extensive feature detection of N-terminal protein sorting signals [J].
Bannai, H ;
Tamada, Y ;
Maruyama, O ;
Nakai, K ;
Miyano, S .
BIOINFORMATICS, 2002, 18 (02) :298-305
[5]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[6]   PROSET - A FAST PROCEDURE TO CREATE NONREDUNDANT SETS OF PROTEIN SEQUENCES [J].
BRENDEL, V .
MATHEMATICAL AND COMPUTER MODELLING, 1992, 16 (6-7) :37-43
[7]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[8]   Using functional domain composition and support vector machines for prediction of protein subcellular location [J].
Chou, KC ;
Cai, YD .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (48) :45765-45769
[9]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[10]   A NOVEL-APPROACH TO PREDICTING PROTEIN STRUCTURAL CLASSES IN A (20-1)-D AMINO-ACID-COMPOSITION SPACE [J].
CHOU, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 21 (04) :319-344