MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples

被引:40
作者
Behr, Jonas [1 ,2 ]
Kahles, Andre [1 ]
Zhong, Yi [1 ]
Sreedharan, Vipin T. [1 ]
Drewe, Philipp [1 ]
Raetsch, Gunnar [1 ]
机构
[1] Sloan Kettering Inst, Computat Biol Ctr, New York, NY 10065 USA
[2] Max Planck Gesell, Friedrich Miescher Lab, D-72076 Tubingen, Germany
关键词
ISOFORMS; RECONSTRUCTION; DECONVOLUTION; REVEALS;
D O I
10.1093/bioinformatics/btt442
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. Results: We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome-and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction.
引用
收藏
页码:2529 / 2538
页数:10
相关论文
共 49 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[3]  
[Anonymous], THESIS E KARLS U TUB
[4]   Accurate identification of A-to-I RNA editing in human by transcriptome sequencing [J].
Bahn, Jae Hoon ;
Lee, Jae-Hyung ;
Li, Gang ;
Greer, Christopher ;
Peng, Guangdun ;
Xiao, Xinshu .
GENOME RESEARCH, 2012, 22 (01) :142-150
[5]   Transcript quantification with RNA-Seq data [J].
Bohnert, Regina ;
Behr, Jonas ;
Raetsch, Gunnar .
BMC BIOINFORMATICS, 2009, 10 :P5
[6]   Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution [J].
Bradley, Robert K. ;
Merkin, Jason ;
Lambert, Nicole J. ;
Burge, Christopher B. .
PLOS BIOLOGY, 2012, 10 (01)
[7]   Unlocking the secrets of the genome [J].
Celniker, Susan E. ;
Dillon, Laura A. L. ;
Gerstein, Mark B. ;
Gunsalus, Kristin C. ;
Henikoff, Steven ;
Karpen, Gary H. ;
Kellis, Manolis ;
Lai, Eric C. ;
Lieb, Jason D. ;
MacAlpine, David M. ;
Micklem, Gos ;
Piano, Fabio ;
Snyder, Michael ;
Stein, Lincoln ;
White, Kevin P. ;
Waterston, Robert H. .
NATURE, 2009, 459 (7249) :927-930
[8]   The GENCODE exome: sequencing the complete human exome [J].
Coffey, Alison J. ;
Kokocinski, Felix ;
Calafato, Maria S. ;
Scott, Carol E. ;
Palta, Priit ;
Drury, Eleanor ;
Joyce, Christopher J. ;
LeProust, Emily M. ;
Harrow, Jen ;
Hunt, Sarah ;
Lehesjoki, Anna-Elina ;
Turner, Daniel J. ;
Hubbard, Tim J. ;
Palotie, Aarno .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2011, 19 (07) :827-831
[9]   Optimal spliced alignments of short sequence reads [J].
De Bona, Fabio ;
Ossowski, Stephan ;
Schneeberger, Korbinian ;
Raetsch, Gunnar .
BIOINFORMATICS, 2008, 24 (16) :I174-I180
[10]   Annotating genomes with massive-scale RNA sequencing [J].
Denoeud, France ;
Aury, Jean-Marc ;
Da Silva, Corinne ;
Noel, Benjamin ;
Rogier, Odile ;
Delledonne, Massimo ;
Morgante, Michele ;
Valle, Giorgio ;
Wincker, Patrick ;
Scarpelli, Claude ;
Jaillon, Olivier ;
Artiguenave, Francois .
GENOME BIOLOGY, 2008, 9 (12)