The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads

被引：1787

作者：

Liao, Yang ^{[1
,2
]}

Smyth, Gordon K. ^{[1
,3
]}

Shi, Wei ^{[1
,4
]}

机构：

[1] Walter & Eliza Hall Inst Med Res, Bioinformat Div, 1G Royal Parade, Parkville, Vic 3052, Australia

[2] Univ Melbourne, Dept Med Biol, Parkville, Vic 3010, Australia

[3] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010, Australia

[4] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3010, Australia

来源：

NUCLEIC ACIDS RESEARCH | 2019年 / 47卷 / 08期

基金：

澳大利亚国家健康与医学研究理事会; 英国医学研究理事会;

关键词：

SEQ DATA; SPLICE JUNCTIONS; ALIGNER;

D O I：

10.1093/nar/gkz114

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

070307 [化学生物学]; 071010 [生物化学与分子生物学];

摘要：

We present Rsubread, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads. Rsubread is based on the successful Subread suite with the added ease-of-use of the R programming environment, creating a matrix of read counts directly as an R object ready for downstream analysis. It integrates read mapping and quantification in a single package and has no software dependencies other than R itself. We demonstrate Rsubread's ability to detect exon-exon junctions de novo and to quantify expression at the level of either genes, exons or exon junctions. The resulting read counts can be input directly into a wide range of downstream statistical analyses using other Bioconductor packages. Using SEQC data and simulations, we compare Rsubread to TopHat2, STAR and HTSeq as well as to counting functions in the Bioconductor infrastructure packages. We consider the performance of these tools on the combined quantification task starting from raw sequence reads through to summary counts, and in particular evaluate the performance of different combinations of alignment and counting algorithms. We show that Rsubread is faster and uses less memory than competitor tools and produces read count summaries that more accurately correlate with true values.

引用

页数：9

共 29 条

[1]

HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].

Anders, Simon ;

Pyl, Paul Theodor ;

Huber, Wolfgang .

BIOINFORMATICS, 2015, 31 (02) :166-169

[2]

Detecting differential usage of exons from RNA-seq data [J].

Anders, Simon ;

Reyes, Alejandro ;

Huber, Wolfgang .

GENOME RESEARCH, 2012, 22 (10) :2008-2017

[3]

[Anonymous], 2013, Dynamic Documents with R and knitr

[4]

Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].

Au, Kin Fai ;

Jiang, Hui ;

Lin, Lan ;

Xing, Yi ;

Wong, Wing Hung .

NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578

[5]

Baruzzo G, 2017, NAT METHODS, V14, P135, DOI [10.1038/nmeth.4106, 10.1038/NMETH.4106]

[6]

Chen Yunshun, 2016, F1000Res, V5, P1438, DOI 10.12688/f1000research.8987.2

[7]

de Santiago I, 2018, METHODS MOL BIOL, V1689, P195, DOI 10.1007/978-1-4939-7380-4_17

[8]

STAR: ultrafast universal RNA-seq aligner [J].

Dobin, Alexander ;

Davis, Carrie A. ;

Schlesinger, Felix ;

Drenkow, Jorg ;

Zaleski, Chris ;

Jha, Sonali ;

Batut, Philippe ;

Chaisson, Mark ;

Gingeras, Thomas R. .

BIOINFORMATICS, 2013, 29 (01) :15-21

[9]

SeqAn An efficient, generic C++ library for sequence analysis [J].

Doering, Andreas ;

Weese, David ;

Rausch, Tobias ;

Reinert, Knut .

BMC BIOINFORMATICS, 2008, 9 (1)

[10]

Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]

← 1 2 3 →