Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

被引:234
作者
Grant, Gregory R. [1 ,2 ,3 ]
Farkas, Michael H. [4 ]
Pizarro, Angel D. [2 ]
Lahens, Nicholas F. [5 ]
Schug, Jonathan [3 ]
Brunk, Brian P. [1 ]
Stoeckert, Christian J. [1 ,3 ]
Hogenesch, John B. [1 ,2 ,5 ]
Pierce, Eric A. [4 ]
机构
[1] Univ Penn, Sch Med, Penn Ctr Bioinformat, Philadelphia, PA 19104 USA
[2] Univ Penn, Sch Med, Inst Translat Med & Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Sch Med, Dept Genet, Philadelphia, PA 19104 USA
[4] Univ Penn, Sch Med, FM Kirby Ctr Mol Ophthalmol, Philadelphia, PA 19104 USA
[5] Univ Penn, Sch Med, Dept Pharmacol, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
SPLICE JUNCTIONS; QUANTIFICATION; ULTRAFAST; REVEALS; COMPLEX; TOOL;
D O I
10.1093/bioinformatics/btr427
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. Results: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.
引用
收藏
页码:2518 / 2528
页数:11
相关论文
共 27 条
[1]   MLL2 Is Required in Oocytes for Bulk Histone 3 Lysine 4 Trimethylation and Transcriptional Silencing [J].
Andreu-Vieyra, Claudia V. ;
Chen, Ruihong ;
Agno, Julio E. ;
Glaser, Stefan ;
Anastassiadis, Konstantinos ;
Stewart, A. Francis ;
Matzuk, Martin M. .
PLOS BIOLOGY, 2010, 8 (08) :53-54
[2]   Basal body dysfunction is a likely cause of pleiotropic Bardet-Biedl syndrome [J].
Ansley, SJ ;
Badano, JL ;
Blacque, OE ;
Hill, J ;
Hoskins, BE ;
Leitch, CC ;
Kim, JC ;
Ross, AJ ;
Eichers, ER ;
Teslovich, TM ;
Mah, AK ;
Johnsen, RC ;
Cavender, JC ;
Lewis, RA ;
Leroux, MR ;
Beales, PL ;
Katsanis, N .
NATURE, 2003, 425 (6958) :628-633
[3]   Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].
Au, Kin Fai ;
Jiang, Hui ;
Lin, Lan ;
Xing, Yi ;
Wong, Wing Hung .
NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578
[4]   The ciliopathies: An emerging class of human genetic disorders [J].
Badano, Jose L. ;
Mitsuma, Norimasa ;
Beales, Phil L. ;
Katsanis, Nicholas .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2006, 7 :125-148
[5]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[6]  
Burrows M., 1994, 124 SRS
[7]   Activator-mediated recruitment of the MLL2 methyltransferase complex to the β-globin locus [J].
Demers, Celina ;
Chaturvedi, Chandra-Pralkash ;
Ranish, Jeffrey A. ;
Juban, Gaetan ;
Lai, Patrick ;
Morle, Francois ;
Aebersold, Ruedi ;
Dilworth, F. Jeffrey ;
Groudine, Mark ;
Brand, Marjorie .
MOLECULAR CELL, 2007, 27 (04) :573-584
[8]   HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data [J].
Dimon, Michelle T. ;
Sorber, Katherine ;
DeRisi, Joseph L. .
PLOS ONE, 2010, 5 (11)
[9]   Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs [J].
Guttman, Mitchell ;
Garber, Manuel ;
Levin, Joshua Z. ;
Donaghey, Julie ;
Robinson, James ;
Adiconis, Xian ;
Fan, Lin ;
Koziol, Magdalena J. ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Rinn, John L. ;
Lander, Eric S. ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :503-U166
[10]   BFAST: An Alignment Tool for Large Scale Genome Resequencing [J].
Homer, Nils ;
Merriman, Barry ;
Nelson, Stanley F. .
PLOS ONE, 2009, 4 (11) :A95-A106