A machine learning strategy to identify candidate binding sites in human protein-coding sequence

被引:8
作者
Down, Thomas [1 ]
Leong, Bernard [1 ]
Hubbard, Tim J. P. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
D O I
10.1186/1471-2105-7-419
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. Results: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. Conclusion: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.
引用
收藏
页数:13
相关论文
共 29 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Vertebrate Genome Annotation (Vega) database [J].
Ashurst, JL ;
Chen, CK ;
Gilbert, JGR ;
Jekosch, K ;
Keenan, S ;
Meidl, P ;
Searle, SM ;
Stalker, J ;
Storey, R ;
Trevanion, S ;
Wilming, L ;
Hubbard, T .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D459-D465
[3]   An overview of ensembl [J].
Birney, E ;
Andrews, TD ;
Bevan, P ;
Caccamo, M ;
Chen, Y ;
Clarke, L ;
Coates, G ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Down, T ;
Eyras, E ;
Fernandez-Suarez, XM ;
Gane, P ;
Gibbins, B ;
Gilbert, J ;
Hammond, M ;
Hotz, HR ;
Iyer, V ;
Jekosch, K ;
Kahari, A ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Lehvaslaiho, H ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Proctor, G ;
Rae, M ;
Searle, S ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Ureta-Vidal, A ;
Woodwark, KC ;
Cameron, G ;
Durbin, R ;
Cox, A ;
Hubbard, T ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :925-928
[4]  
BLANCHETTE M, 2003, P 7 ANN INT C RES CO, P57
[5]  
Bourgeois CF, 1999, MOL CELL BIOL, V19, P7347
[6]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[7]   The splicing factors 9G8 and SRp20 transactivate splicing through different and specific enhancers [J].
Cavaloc, Y ;
Bourgeois, CF ;
Kister, L ;
Stévenin, J .
RNA, 1999, 5 (03) :468-483
[8]   Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human [J].
Clark, F ;
Thanaraj, TA .
HUMAN MOLECULAR GENETICS, 2002, 11 (04) :451-464
[9]  
DOWN TA, 2002, BMC BIOINFORMATICS, V5, P144
[10]  
DOWN TA, 2003, THESIS U CAMBRIDGE