Normalization of RNA-seq data using factor analysis of control genes or samples

被引:1278
作者
Risso, Davide [1 ]
Ngai, John [2 ,3 ,4 ]
Speed, Terence P. [1 ,5 ,6 ]
Dudoit, Sandrine [1 ,7 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Funct Genom Lab, Berkeley, CA 94720 USA
[5] Royal Melbourne Hosp, Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3050, Australia
[6] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
[7] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
基金
英国医学研究理事会;
关键词
LOCALLY WEIGHTED REGRESSION; MESSENGER-RNA; DIFFERENTIAL EXPRESSION; MICROARRAY DATA; SINGLE;
D O I
10.1038/nbt.2931
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.
引用
收藏
页码:896 / 902
页数:7
相关论文
共 29 条
[11]   Silencing of Odorant Receptor Genes by G Protein βγ Signaling Ensures the Expression of One Odorant Receptor per Olfactory Sensory Neuron [J].
Ferreira, Todd ;
Wilson, Sarah R. ;
Choi, Yoon Gi ;
Risso, Davide ;
Dudoit, Sandrine ;
Speed, Terence P. ;
Ngai, John .
NEURON, 2014, 81 (04) :847-859
[12]  
Gagnon-Bartsch J. A., 2013, 820 U CAL DEP STAT, P820
[13]   Using control genes to correct for unwanted variation in microarray data [J].
Gagnon-Bartsch, Johann A. ;
Speed, Terence P. .
BIOSTATISTICS, 2012, 13 (03) :539-552
[14]   Removing technical variability in RNA-seq data using conditional quantile normalization [J].
Hansen, Kasper D. ;
Irizarry, Rafael A. ;
WU, Zhijin .
BIOSTATISTICS, 2012, 13 (02) :204-216
[15]  
Jacob Laurent, 2013, 818 U CAL DEP STAT
[16]   Synthetic spike-in standards for RNA-seq experiments [J].
Jiang, Lichun ;
Schlesinger, Felix ;
Davis, Carrie A. ;
Zhang, Yu ;
Li, Renhua ;
Salit, Marc ;
Gingeras, Thomas R. ;
Oliver, Brian .
GENOME RESEARCH, 2011, 21 (09) :1543-1551
[17]   Capturing heterogeneity in gene expression studies by surrogate variable analysis [J].
Leek, Jeffrey T. ;
Storey, John D. .
PLOS GENETICS, 2007, 3 (09) :1724-1735
[18]   Revisiting Global Gene Expression Analysis [J].
Loven, Jakob ;
Orlando, David A. ;
Sigova, Alla A. ;
Lin, Charles Y. ;
Rahl, Peter B. ;
Burge, Christopher B. ;
Levens, David L. ;
Lee, Tong Ihn ;
Young, Richard A. .
CELL, 2012, 151 (03) :476-482
[19]   Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes [J].
Oshlack, Alicia ;
Emslie, Dianne ;
Corcoran, Lynn M. ;
Smyth, Gordon K. .
GENOME BIOLOGY, 2007, 8 (01)
[20]   mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies [J].
Qing Tao ;
Yu Ying ;
Du TingTing ;
Shi LeMing .
SCIENCE CHINA-LIFE SCIENCES, 2013, 56 (02) :134-142