Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs

被引:134
作者
Chen, Yong-Zi [1 ]
Tang, Yu-Rong [1 ]
Sheng, Zhi-Ya [1 ,2 ]
Zhang, Ziding [1 ]
机构
[1] China Agr Univ, Coll Biol Sci, Bioinformat Ctr, Beijing 100094, Peoples R China
[2] Natl Inst Biol Sci, Beijing 102206, Peoples R China
关键词
D O I
10.1186/1471-2105-9-101
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins. Results: A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O- glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of k-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O- glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O- glycosylation to non-glycosylation sites in training datasets was set as 1: 1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O- glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1: 5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i. e. S+T predictor). Either in 1: 1 or 1: 5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O- glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors. Conclusion: Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at http://bioinformatics.cau.edu.cn/zzd_lab/CKSAAP_OGlySite/.
引用
收藏
页数:12
相关论文
共 42 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[3]   LiveBench-1: Continuous benchmarking of protein structure prediction servers [J].
Bujnicki, JM ;
Elofsson, A ;
Fischer, D ;
Rychlewski, L .
PROTEIN SCIENCE, 2001, 10 (02) :352-361
[4]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[5]   Artificial neural network model for predicting the specificity of GalNAc-transferase [J].
Cai, YD ;
Chou, KC .
ANALYTICAL BIOCHEMISTRY, 1996, 243 (02) :284-285
[6]   Support vector machines for predicting the specificity of GaINAc-transferase [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
PEPTIDES, 2002, 23 (01) :205-208
[7]   SIGNAL DETECTABILITY - THE USE OF ROC CURVES AND THEIR ANALYSES [J].
CENTOR, RM .
MEDICAL DECISION MAKING, 1991, 11 (02) :102-106
[8]   Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs [J].
Chen, Ke ;
Kurgan, Lukasz A. ;
Ruan, Jishou .
BMC STRUCTURAL BIOLOGY, 2007, 7
[9]   Prediction of protein crystallization using collocation of amino acid pairs [J].
Chen, Ke ;
Kurgan, Lukasz ;
Rahbari, Mandana .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 355 (03) :764-769
[10]   A VECTOR PROJECTION METHOD FOR PREDICTING THE SPECIFICITY OF GALNAC-TRANSFERASE [J].
CHOU, KC ;
ZHANG, CT ;
KEZDY, FJ ;
POORMAN, RA .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 21 (02) :118-126