Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

被引:3649
作者
McCarthy, Davis J. [1 ]
Chen, Yunshun [1 ,2 ]
Smyth, Gordon K. [1 ,3 ]
机构
[1] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3052, Australia
[2] Univ Melbourne, Dept Med Biol, Melbourne, Vic 3010, Australia
[3] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
基金
英国医学研究理事会;
关键词
GENE-EXPRESSION; TRANSCRIPTOME; SAGE; MODEL; BIOCONDUCTOR; VARIABILITY; POWERFUL; TESTS;
D O I
10.1093/nar/gks042
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.
引用
收藏
页码:4288 / 4297
页数:10
相关论文
共 61 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   A Two-Stage Poisson Model for Testing RNA-Seq Data [J].
Auer, Paul L. ;
Doerge, Rebecca W. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[3]   Statistical Design and Analysis of RNA Sequencing Data [J].
Auer, Paul L. ;
Doerge, R. W. .
GENETICS, 2010, 185 (02) :405-U32
[4]   Overdispersed logistic regression for SAGE: Modelling multiple groups and covariates [J].
Baggerly, KA ;
Deng, L ;
Morris, JS ;
Aldaz, CM .
BMC BIOINFORMATICS, 2004, 5 (1)
[5]   A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Sex-specific and lineage-specific alternative splicing in primates [J].
Blekhman, Ran ;
Marioni, John C. ;
Zumbo, Paul ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2010, 20 (02) :180-189
[8]   Quantitative comparison of genome-wide DNA methylation mapping technologies [J].
Bock, Christoph ;
Tomazou, Eleni M. ;
Brinkman, Arie B. ;
Mueller, Fabian ;
Simmer, Femke ;
Gu, Hongcang ;
Jaeger, Natalie ;
Gnirke, Andreas ;
Stunnenberg, Hendrik G. ;
Meissner, Alexander .
NATURE BIOTECHNOLOGY, 2010, 28 (10) :1106-U196
[9]  
Brent R. P., 1973, Algorithms for minimization without derivatives
[10]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11