Removing technical variability in RNA-seq data using conditional quantile normalization

被引:416
作者
Hansen, Kasper D. [2 ]
Irizarry, Rafael A. [2 ]
WU, Zhijin [1 ]
机构
[1] Brown Univ, Dept Biostat, Providence, RI 02912 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Gene expression; Normalization; RNA sequencing; DIFFERENTIAL EXPRESSION ANALYSIS; MODEL;
D O I
10.1093/biostatistics/kxr054
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade's worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.
引用
收藏
页码:204 / 216
页数:13
相关论文
共 38 条
[11]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[12]   Biases in Illumina transcriptome sequencing caused by random hexamer priming [J].
Hansen, Kasper D. ;
Brenner, Steven E. ;
Dudoit, Sandrine .
NUCLEIC ACIDS RESEARCH, 2010, 38 (12) :e131
[13]   Sequencing technology does not eliminate biological variability [J].
Hansen, Kasper D. ;
Wu, Zhijin ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :572-573
[14]  
Koenker Roger, 2005, ECONOMETRIC SOC MONO, DOI DOI 10.1017/CBO9780511754098
[15]   Cloud-scale RNA-sequencing differential expression analysis with Myrna [J].
Langmead, Ben ;
Hansen, Kasper D. ;
Leek, Jeffrey T. .
GENOME BIOLOGY, 2010, 11 (08) :R83
[16]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[17]   Analysis of HIV-1 Expression Level and Sense of Transcription by High-Throughput Sequencing of the Infected Cell [J].
Lefebvre, Gregory ;
Desfarges, Sebastien ;
Uyttebroeck, Frederic ;
Munoz, Miguel ;
Beerenwinkel, Niko ;
Rougemont, Jacques ;
Telenti, Amalio ;
Ciuffi, Angela .
JOURNAL OF VIROLOGY, 2011, 85 (13) :6205-6211
[18]   Modeling non-uniformity in short-read rates in RNA-Seq data [J].
Li, Jun ;
Jiang, Hui ;
Wong, Wing Hung .
GENOME BIOLOGY, 2010, 11 (05)
[19]   RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays [J].
Marioni, John C. ;
Mason, Christopher E. ;
Mane, Shrikant M. ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2008, 18 (09) :1509-1517
[20]  
MCCARTHY D. J., 2012, NUCL ACIDS IN PRESS