Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites

被引:738
作者
Julenius, K [1 ]
Molgaard, A [1 ]
Gupta, R [1 ]
Brunak, S [1 ]
机构
[1] Tech Univ Denmark, Biocentrum, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
关键词
machine learning; mucin-type; neural networks; O-glycosylation; prediction;
D O I
10.1093/glycob/cwh151
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc.
引用
收藏
页码:153 / 164
页数:12
相关论文
共 66 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database [J].
Apweiler, R ;
Hermjakob, H ;
Sharon, N .
BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 1999, 1473 (01) :4-8
[3]   THE HUMAN MUC2 MUCIN APOPROTEIN APPEARS TO DIMERIZE BEFORE O-GLYCOSYLATION AND SHARES EPITOPES WITH THE INSOLUBLE MUCIN OF RAT SMALL-INTESTINE [J].
ASKER, N ;
BAECKSTROM, D ;
AXELSSON, MAB ;
CARLSTEDT, I ;
HANSSON, GC .
BIOCHEMICAL JOURNAL, 1995, 308 :873-880
[4]  
BENDTSEN JD, IN PRESS J MOL BIOL
[5]   A novel human UDP-N-acetyl-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase, GalNAc-T7, with specificity for partial GalNAc-glycosylated acceptor substrates [J].
Bennett, EP ;
Hassan, H ;
Hollingsworth, MA ;
Clausen, H .
FEBS LETTERS, 1999, 460 (02) :226-230
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   Artificial neural network model for predicting the specificity of GalNAc-transferase [J].
Cai, YD ;
Chou, KC .
ANALYTICAL BIOCHEMISTRY, 1996, 243 (02) :284-285
[8]   Artificial neural network method for predicting the specificity of GalNAc-transferase [J].
Cai, YD ;
Yu, H ;
Chou, KC .
JOURNAL OF PROTEIN CHEMISTRY, 1997, 16 (07) :689-700
[9]   Support vector machines for predicting the specificity of GaINAc-transferase [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
PEPTIDES, 2002, 23 (01) :205-208
[10]  
CARRAWAY K L, 1991, Glycobiology, V1, P131, DOI 10.1093/glycob/1.2.131