A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data

被引:153
作者
Wu, Hao [1 ]
Wang, Chi [2 ,3 ]
Wu, Zhijin [4 ]
机构
[1] Emory Univ, Dept Biostat & Bioinformat, Atlanta, GA 30322 USA
[2] Univ Kentucky, Dept Biostat, Lexington, KY 40536 USA
[3] Univ Kentucky, Markey Canc Ctr, Lexington, KY 40536 USA
[4] Brown Univ, Dept Biostat, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
Differential expression; Empirical Bayes; RNA sequencing; Shrinkage estimator; GENE; REPRODUCIBILITY; PACKAGE;
D O I
10.1093/biostatistics/kxs033
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the use of the negative binomial (Gamma-Poisson mixture) model. That is, the unobserved gene expression is modeled by a gamma random variable and, given the expression, the sequencing read counts are modeled as Poisson. The distinct feature in various methods is how the variance, or dispersion, in the Gamma distribution is modeled and estimated. We evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.
引用
收藏
页码:232 / 243
页数:12
相关论文
共 22 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Sex-specific and lineage-specific alternative splicing in primates [J].
Blekhman, Ran ;
Marioni, John C. ;
Zumbo, Paul ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2010, 20 (02) :180-189
[3]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[4]   Polymorphic Cis- and Trans-Regulation of Human Gene Expression [J].
Cheung, Vivian G. ;
Nayak, Renuka R. ;
Wang, Isabel Xiaorong ;
Elwyn, Susannah ;
Cousins, Sarah M. ;
Morley, Michael ;
Spielman, Richard S. .
PLOS BIOLOGY, 2010, 8 (09)
[5]   ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets [J].
Frazee, Alyssa C. ;
Langmead, Ben ;
Leek, Jeffrey T. .
BMC BIOINFORMATICS, 2011, 12
[6]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[7]   Removing technical variability in RNA-seq data using conditional quantile normalization [J].
Hansen, Kasper D. ;
Irizarry, Rafael A. ;
WU, Zhijin .
BIOSTATISTICS, 2012, 13 (02) :204-216
[8]   Sequencing technology does not eliminate biological variability [J].
Hansen, Kasper D. ;
Wu, Zhijin ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :572-573
[9]   baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data [J].
Hardcastle, Thomas J. ;
Kelly, Krystyna A. .
BMC BIOINFORMATICS, 2010, 11
[10]   RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays [J].
Marioni, John C. ;
Mason, Christopher E. ;
Mane, Shrikant M. ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2008, 18 (09) :1509-1517