Simultaneous Isoform Discovery and Quantification from RNA-Seq

被引:17
作者
Hiller D. [1 ]
Wong W.H. [2 ]
机构
[1] Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, 21205, 855 N. Wolfe St.
[2] Department of Statistics, Stanford University, Stanford, CA, 94305, Sequoia Hall
基金
美国国家卫生研究院;
关键词
Algorithms; Alternative splicing; Isoform discovery; Monte Carlo; RNA-seq;
D O I
10.1007/s12561-012-9069-2
中图分类号
学科分类号
摘要
RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46. 3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired. © 2012 International Chinese Statistical Association.
引用
收藏
页码:100 / 118
页数:18
相关论文
共 40 条
[1]  
Anton M.A., Gorostiaga D., Guruceaga E., Segura V., Carmona-Saez P., Pascual-Montano A., Pio R., Montuenga L.M., Rubio A., Space: an algorithm to predict and quantify alternatively spliced isoforms using microarrays, Genome Biol, 9, (2008)
[2]  
Au K.F., Jiang H., Lin L., Xing Y., Wong W.H., Detection of splice junctions from paired-end RNA-seq data by splicemap, Nucleic Acids Res, 38, 14, pp. 4570-4578, (2010)
[3]  
Geyer C., Markov chain Monte Carlo maximum likelihood, Computing Science and Statistics: Proc 23rd Symposium on the Interface, pp. 156-163, (1991)
[4]  
Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A., Full-length transcriptome assembly from RNA-seq data without a reference genome, Nat Biotechnol, 29, pp. 644-652, (2011)
[5]  
Grant G.R., Farkas M.H., Pizarro A.D., Lahens N.F., Schug J., Brunk B.P., Stoeckert C.J., Hogenesch J.B., Pierce E.A., Comparative analysis of RNA-seq alignment algorithms and the RNA-seq unified mapper (rum), Bioinformatics, 27, 18, pp. 2518-2528, (2011)
[6]  
Guttman M., Garber M., Levin J.Z., Donaghey J., Robinson J., Adiconis X., Fan L., Koziol M.J., Gnirke A., Nusbaum C., Rinn J.L., Lander E.S., Regev A., Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas, Nat Biotechnol, 28, pp. 503-510, (2010)
[7]  
Hardcastle T., Kelly K., bayseq: empirical methods for identifying differential expression in sequence count data, BMC Bioinform, 11, 1, (2010)
[8]  
Heber S., Alekseyev M., Sze S.H., Tang H., Pevzner P.A., Splicing graphs and EST assembly problem, Bioinformatics, 18, SUPPL. 1, (2002)
[9]  
Hiller D., Alternative splicing analysis using RNA-seq data, (2010)
[10]  
Hiller D., Jiang H., Xu W., Wong W.H., Identifiability of isoform deconvolution from junction arrays and RNA-seq, Bioinformatics, 25, 23, pp. 3056-3059, (2009)