COSSMO: predicting competitive alternative splice site selection using deep learning

被引:33
作者
Bretschneider, Hannes [1 ,2 ]
Gandhi, Shreshth [1 ]
Deshwar, Amit G. [1 ,3 ]
Zuberi, Khalid [1 ]
Frey, Brendan J. [1 ,2 ,3 ]
机构
[1] Deep Genom Inc, Toronto, ON M5G 1L7, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
[3] Univ Toronto, Edward S Rogers Sr Dept Elect & Comp Engn, Toronto, ON M5S 2E3, Canada
关键词
SEQUENCE; CLONING; MOTIFS;
D O I
10.1093/bioinformatics/bty244
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3' acceptor site conditional on a fixed upstream 5' donor site or the choice of a 5' donor site conditional on a fixed 3' acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. Results: COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R-2 of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity.
引用
收藏
页码:429 / 437
页数:9
相关论文
共 29 条
[21]  
Sukhbaatar S, 2016, ADV NEUR IN, V29
[22]   The significant other: splicing by the minor spliceosome [J].
Turunen, Janne J. ;
Niemela, Elina H. ;
Verma, Bhupendra ;
Frilander, Mikko J. .
WILEY INTERDISCIPLINARY REVIEWS-RNA, 2013, 4 (01) :61-76
[23]   Deep intronic mutations and human disease [J].
Vaz-Drago, Rita ;
Custodio, Noelia ;
Carmo-Fonseca, Maria .
HUMAN GENETICS, 2017, 136 (09) :1093-1111
[24]   Splicing regulation: From a parts list of regulatory elements to an integrated splicing code [J].
Wang, Zefeng ;
Burge, Christopher B. .
RNA, 2008, 14 (05) :802-813
[25]   The human splicing code reveals new insights into the genetic determinants of disease [J].
Xiong, Hui Y. ;
Alipanahi, Babak ;
Lee, Leo J. ;
Bretschneider, Hannes ;
Merico, Daniele ;
Yuen, Ryan K. C. ;
Hua, Yimin ;
Gueroussov, Serge ;
Najafabadi, Hamed S. ;
Hughes, Timothy R. ;
Morris, Quaid ;
Barash, Yoseph ;
Krainer, Adrian R. ;
Jojic, Nebojsa ;
Scherer, Stephen W. ;
Blencowe, Benjamin J. ;
Frey, Brendan J. .
SCIENCE, 2015, 347 (6218)
[26]   Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context [J].
Xiong, Hui Yuan ;
Barash, Yoseph ;
Frey, Brendan J. .
BIOINFORMATICS, 2011, 27 (18) :2554-2562
[27]   Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals [J].
Yeo, G ;
Burge, CB .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (2-3) :377-394
[28]   CLONING AND DOMAIN-STRUCTURE OF THE MAMMALIAN SPLICING FACTOR U2AF [J].
ZAMORE, PD ;
PATTON, JG ;
GREEN, MR .
NATURE, 1992, 355 (6361) :609-614
[29]   Statistical features of human exons and their flanking regions [J].
Zhang, MQ .
HUMAN MOLECULAR GENETICS, 1998, 7 (05) :919-932