A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

被引:866
作者
Dillies, Marie-Agnes [1 ]
Rau, Andrea [1 ]
Aubert, Julie [1 ]
Hennequet-Antier, Christelle [1 ]
Jeanmougin, Marine [1 ]
Servant, Nicolas [1 ]
Keime, Celine [1 ]
Marot, Guillemette [1 ]
Castel, David [1 ]
Estelle, Jordi [1 ]
Guernec, Gregory [1 ]
Jagla, Bernd [1 ]
Jouneau, Luc [1 ]
Laloe, Denis [1 ]
Le Gall, Caroline [1 ]
Schaeffer, Brigitte [1 ]
Le Crom, Stephane [1 ]
Guedj, Mickael [1 ]
Jaffrezic, Florence [1 ]
机构
[1] Inst Pasteur, F-75724 Paris 15, France
关键词
high-throughput sequencing; RNA-seq; normalization; differential analysis; GENE-EXPRESSION; SEQ DATA; TRANSCRIPTOME; QUANTIFICATION; STRATEGY; REVEALS; MOUSE; ARRAY;
D O I
10.1093/bib/bbs046
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.
引用
收藏
页码:671 / 683
页数:13
相关论文
共 46 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[3]   A Two-Stage Poisson Model for Testing RNA-Seq Data [J].
Auer, Paul L. ;
Doerge, Rebecca W. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]   Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[7]  
Calza S, 2010, METHODS MOL BIOL, V673, P37, DOI 10.1007/978-1-60761-842-3_3
[8]   Human housekeeping genes are compact [J].
Eisenberg, E ;
Levanon, EY .
TRENDS IN GENETICS, 2003, 19 (07) :362-365
[9]   Full-length transcriptome assembly from RNA-Seq data without a reference genome [J].
Grabherr, Manfred G. ;
Haas, Brian J. ;
Yassour, Moran ;
Levin, Joshua Z. ;
Thompson, Dawn A. ;
Amit, Ido ;
Adiconis, Xian ;
Fan, Lin ;
Raychowdhury, Raktima ;
Zeng, Qiandong ;
Chen, Zehua ;
Mauceli, Evan ;
Hacohen, Nir ;
Gnirke, Andreas ;
Rhind, Nicholas ;
di Palma, Federica ;
Birren, Bruce W. ;
Nusbaum, Chad ;
Lindblad-Toh, Kerstin ;
Friedman, Nir ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :644-U130
[10]   Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs [J].
Guttman, Mitchell ;
Garber, Manuel ;
Levin, Joshua Z. ;
Donaghey, Julie ;
Robinson, James ;
Adiconis, Xian ;
Fan, Lin ;
Koziol, Magdalena J. ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Rinn, John L. ;
Lander, Eric S. ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :503-U166