An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq

被引:16
作者
Azofeifa, Joseph G. [1 ]
Allen, Mary A. [2 ]
Lladser, Manuel E. [3 ]
Dowell, Robin D. [2 ,4 ]
机构
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Univ Colorado, BioFrontiers Inst, Boulder, CO 80309 USA
[3] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA
[4] Univ Colorado, Dept Mol Cellular & Dev Biol, Boulder, CO 80309 USA
关键词
GRO-seq; nascent transcription; logisitic regression; hidden Markov models; algorithms; experimentation; ELONGATION; INITIATION; ENHANCERS; ELEMENTS; MODELS; SITES; MAP;
D O I
10.1109/TCBB.2016.2520919
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
引用
收藏
页码:1070 / 1081
页数:12
相关论文
共 51 条
[1]   Global analysis of p53-regulated transcription identifies its direct targets and unexpected regulatory mechanisms [J].
Allen, Mary Ann ;
Andrysik, Zdenek ;
Dengler, Veronica L. ;
Mellert, Hestia S. ;
Guarnieri, Anna ;
Freeman, Justin A. ;
Sullivan, Kelly D. ;
Galbraith, Matthew D. ;
Luo, Xin ;
Kraus, W. Lee ;
Dowell, Robin D. ;
Espinosa, Joaquin M. .
ELIFE, 2014, 3
[2]   Vespucci: a system for building annotated databases of nascent transcripts [J].
Allison, Karmel A. ;
Kaikkonen, Minna U. ;
Gaasterland, Terry ;
Glass, Christopher K. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (04) :2433-2447
[3]   How to stop: the mysterious links among RNA polymerase II occupancy 3 vertical bar of genes, mRNA 3 ' processing and termination [J].
Anamika, Krishanpal ;
Gyenis, Akos ;
Tora, Laszlo .
TRANSCRIPTION-AUSTIN, 2013, 4 (01) :7-12
[4]  
[Anonymous], 2010, GENOME BIOL
[5]  
Arimbasseri A. G., 2013, TRANSCRIPTION, V4
[6]  
Azofeifa J., 2014, Proceedings of the 5th ACM Conference on Bioinformatics Computational Biology, and Health Informatics, P174, DOI DOI 10.1145/2649387.2649427
[7]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[8]   A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized dirichlet mixture [J].
Bouguila, Nizar ;
Ziou, Djemel .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (09) :2657-2668
[9]  
Chadwick LH, 2012, EPIGENOMICS-UK, V4, P317, DOI [10.2217/EPI.12.18, 10.2217/epi.12.18]
[10]   groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data [J].
Chae, Minho ;
Danko, Charles G. ;
Kraus, W. Lee .
BMC BIOINFORMATICS, 2015, 16