Prediction of RNA binding sites in a protein using SVM and PSSM profile

被引:239
作者
Kumar, Manish [1 ]
Gromiha, A. Michael [2 ]
Raghava, G. P. S. [1 ]
机构
[1] Inst Microbial Technol, Bioinformat Ctr, Chandigarh 160036, India
[2] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo 1350064, Japan
关键词
evolutionary information; interacting residue; protein; RNA; SVM;
D O I
10.1002/prot.21677
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
RNA-binding proteins (RBPs) play key roles in post-transcriptional control of gene expression, which, along with transcriptional regulation, is a major way to regulate patterns of gene expression during development. Thus, the identification and prediction of RNA binding sites is an important step in comprehensive understanding of how RBPs control organism development. Combining evolutionary information and support vector machine (SVM), we have developed an improved method for predicting RNA binding sites or RNA interacting residues in a protein sequence. The prediction models developed in this study have been trained and tested on 86 RNA binding protein chains and evaluated using fivefold cross validation technique. First, a SVM model was developed that achieved a maximum Matthew's correlation coefficient (MCC) of 0.31. The performance of this SVM model further improved the MCC from 0.31 to 0.45, when multiple sequence alignment in the form of PSSM profiles was used as input to the SVM, which is far better than the maximum MCC achieved by previous methods (0.41) on the same dataset. In addition, SVM models were also developed on an alternative dataset that contained 107 RBP chains. Utilizing PSSM as input information to the SVM, the training/testing on this alternate dataset achieved a maximum MCC of 0.32. Conclusively, the prediction performance of SVM models developed in this study is better than the existing methods on the same datasets. A web server 'Pprint' was also developed for predicting RNA binding residues in a protein sequence which is freely available at http://www.imtech.res.in/raghava/pprint/.
引用
收藏
页码:189 / 194
页数:6
相关论文
共 14 条
[1]
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[2]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]
Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure [J].
Garg, A ;
Kaur, H ;
Raghava, GPS .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (02) :318-324
[5]
Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search [J].
Garg, A ;
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (15) :14427-14432
[6]
Jeong EN, 2006, LECT NOTES COMPUT SC, V3939, P123
[7]
Jeong Euna, 2004, Genome Inform, V15, P105
[8]
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169
[9]
Prediction of β-turns in proteins from multiple alignment using neural network [J].
Kaur, H ;
Raghava, GPS .
PROTEIN SCIENCE, 2003, 12 (03) :627-634
[10]
Prediction of mitochondrial proteins using support vector machine and hidden Markov model [J].
Kumar, M ;
Verma, R ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2006, 281 (09) :5357-5363