Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites

被引:738
作者
Julenius, K [1 ]
Molgaard, A [1 ]
Gupta, R [1 ]
Brunak, S [1 ]
机构
[1] Tech Univ Denmark, Biocentrum, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
关键词
machine learning; mucin-type; neural networks; O-glycosylation; prediction;
D O I
10.1093/glycob/cwh151
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc.
引用
收藏
页码:153 / 164
页数:12
相关论文
共 66 条
[61]   BIOLOGICAL ROLES OF OLIGOSACCHARIDES - ALL OF THE THEORIES ARE CORRECT [J].
VARKI, A .
GLYCOBIOLOGY, 1993, 3 (02) :97-130
[62]   Cloning and characterization of a novel UDP-GaINAc:polypeptide N-acetylgalactosaminyltransferase, pp-GaINAc-T14 [J].
Wang, H ;
Tachibana, K ;
Zhang, Y ;
Iwasaki, H ;
Kameyama, A ;
Cheng, LM ;
Guo, JM ;
Hiruma, T ;
Togayachi, A ;
Kudo, T ;
Kikuchi, N ;
Narimatsu, H .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 300 (03) :738-744
[63]   The Protein Data Bank and structural genomics [J].
Westbrook, J ;
Feng, ZK ;
Chen, L ;
Yang, HW ;
Berman, HM .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :489-491
[64]   AMINO-ACID DISTRIBUTIONS AROUND O-LINKED GLYCOSYLATION SITES [J].
WILSON, IBH ;
GAVEL, Y ;
VONHEIJNE, G .
BIOCHEMICAL JOURNAL, 1991, 275 :529-534
[65]   Discovery of the shortest sequence motif for high level mucin-type O-glycosylation [J].
Yoshida, A ;
Suzuki, M ;
Ikenaga, H ;
Takeuchi, M .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1997, 272 (27) :16884-16888
[66]   ENZYMIC O-GLYCOSYLATION OF SYNTHETIC PEPTIDES FROM SEQUENCES IN BASIC MYELIN PROTEIN [J].
YOUNG, JD ;
TSUCHIYA, D ;
SANDLIN, DE ;
HOLROYDE, MJ .
BIOCHEMISTRY, 1979, 18 (20) :4444-4448