Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

被引:845
作者
Anders, Simon [1 ]
McCarthy, Davis J. [2 ,3 ]
Chen, Yunshun [4 ,5 ]
Okoniewski, Michal [6 ]
Smyth, Gordon K. [4 ,7 ]
Huber, Wolfgang [1 ]
Robinson, Mark D. [8 ,9 ]
机构
[1] European Mol Biol Lab, Genome Biol Unit, D-69012 Heidelberg, Germany
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford, England
[4] Royal Melbourne Hosp, Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3050, Australia
[5] Univ Melbourne, Dept Med Biol, Melbourne, Vic, Australia
[6] Funct Genom Ctr UNI ETH, Zurich, Switzerland
[7] Univ Melbourne, Dept Math & Stat, Melbourne, Vic, Australia
[8] Univ Zurich, Inst Mol Life Sci, Zurich, Switzerland
[9] Univ Zurich, SIB Swiss Inst Bioinformat, Zurich, Switzerland
基金
瑞士国家科学基金会; 英国医学研究理事会;
关键词
GENE-EXPRESSION; SEQ DATA; STATISTICAL-METHODS; NORMALIZATION; PACKAGE; BIOINFORMATICS; TRANSCRIPTS; POWERFUL; TOPHAT; READS;
D O I
10.1038/nprot.2013.099
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be < 1 h, with computation time < 1 d using a standard desktop PC.
引用
收藏
页码:1765 / 1786
页数:22
相关论文
共 66 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[3]  
[Anonymous], 1990, Classical and modern regression with applications
[4]   Statistical Design and Analysis of RNA Sequencing Data [J].
Auer, Paul L. ;
Doerge, R. W. .
GENETICS, 2010, 185 (02) :405-U32
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Sex-specific and lineage-specific alternative splicing in primates [J].
Blekhman, Ran ;
Marioni, John C. ;
Zumbo, Paul ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2010, 20 (02) :180-189
[7]   Independent filtering increases detection power for high-throughput experiments [J].
Bourgon, Richard ;
Gentleman, Robert ;
Huber, Wolfgang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (21) :9546-9551
[8]   Conservation of an RNA regulatory map between Drosophila and mammals [J].
Brooks, Angela N. ;
Yang, Li ;
Duff, Michael O. ;
Hansen, Kasper D. ;
Park, Jung W. ;
Dudoit, Sandrine ;
Brenner, Steven E. ;
Graveley, Brenton R. .
GENOME RESEARCH, 2011, 21 (02) :193-202
[9]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[10]  
Cappiello C., 2004, IQIS, P68, DOI [10.1145/1012453.1012465, DOI 10.1145/1012453.1012465]