RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

被引:13979
作者
Li, Bo [1 ]
Dewey, Colin N. [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
[2] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; ISOFORM EXPRESSION; SPLICE JUNCTIONS; INFERENCE; REPRODUCIBILITY; SEQUENCES; BROWSER; REVEALS; TOOL;
D O I
10.1186/1471-2105-12-323
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
引用
收藏
页数:16
相关论文
共 42 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], Flux simulator
[3]   Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].
Au, Kin Fai ;
Jiang, Hui ;
Lin, Lan ;
Xing, Yi ;
Wong, Wing Hung .
NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578
[4]   rQuant.web: a tool for RNA-Seq-based transcript quantitation [J].
Bohnert, Regina ;
Raetsch, Gunnar .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W348-W351
[5]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[6]   Why the need for qPCR publication guidelines?-The case for MIQE [J].
Bustin, Stephen A. .
METHODS, 2010, 50 (04) :217-226
[7]   Optimal spliced alignments of short sequence reads [J].
De Bona, Fabio ;
Ossowski, Stephan ;
Schneeberger, Korbinian ;
Raetsch, Gunnar .
BIOINFORMATICS, 2008, 24 (16) :I174-I180
[8]   A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE [J].
Faulkner, Geoffrey J. ;
Forrest, Alistair R. R. ;
Chalk, Alistair M. ;
Schroder, Kate ;
Hayashizaki, Yoshihide ;
Carninci, Piero ;
Hume, David A. ;
Grimmond, Sean M. .
GENOMICS, 2008, 91 (03) :281-288
[9]   Inference of Isoforms from Short Sequence Reads [J].
Feng, Jianxing ;
Li, Wei ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (03) :305-321
[10]   Ensembl 2011 [J].
Flicek, Paul ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Brent, Simon ;
Chen, Yuan ;
Clapham, Peter ;
Coates, Guy ;
Fairley, Susan ;
Fitzgerald, Stephen ;
Gordon, Leo ;
Hendrix, Maurice ;
Hourlier, Thibaut ;
Johnson, Nathan ;
Kaehaeri, Andreas ;
Keefe, Damian ;
Keenan, Stephen ;
Kinsella, Rhoda ;
Kokocinski, Felix ;
Kulesha, Eugene ;
Larsson, Pontus ;
Longden, Ian ;
McLaren, William ;
Overduin, Bert ;
Pritchard, Bethan ;
Riat, Harpreet Singh ;
Rios, Daniel ;
Ritchie, Graham R. S. ;
Ruffier, Magali ;
Schuster, Michael ;
Sobral, Daniel ;
Spudich, Giulietta ;
Tang, Y. Amy ;
Trevanion, Stephen ;
Vandrovcova, Jana ;
Vilella, Albert J. ;
White, Simon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Zamora, Jorge ;
Aken, Bronwen L. ;
Birney, Ewan ;
Cunningham, Fiona ;
Dunham, Ian ;
Durbin, Richard ;
Fernandez-Suarez, Xose M. ;
Herrero, Javier ;
Hubbard, Tim J. P. ;
Parker, Anne ;
Proctor, Glenn .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D800-D806