Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons

被引:124
作者
Smid, Marcel [1 ]
van den Braak, Robert R. J. Coebergh [2 ]
van de Werken, Harmen J. G. [3 ,4 ]
van Riet, Job [3 ,4 ]
van Galen, Anne [1 ]
de Weerd, Vanja [1 ]
van der Vlugt-Daane, Michelle [1 ]
Bril, Sandra, I [1 ]
Lalmahomed, Zarina S. [2 ]
Kloosterman, Wigard P. [5 ]
Wilting, Saskia M. [1 ]
Foekens, John A. [1 ]
Ijzermans, Jan N. M. [2 ]
Martens, John W. M. [1 ,6 ]
Sieuwerts, Anieta M. [1 ,6 ]
机构
[1] Erasmus MC Univ, Med Ctr, Erasmus MC Canc Inst, Dept Med Oncol, NL-3015 CE Rotterdam, Netherlands
[2] Erasmus MC Univ, Med Ctr, Dept Surg, NL-3015 CE Rotterdam, Netherlands
[3] Erasmus MC Univ, Med Ctr, Canc Computat Biol Ctr, Erasmus MC Canc Inst, NL-3015 CE Rotterdam, Netherlands
[4] Erasmus MC Univ, Med Ctr, Dept Urol, Erasmus MC Canc Inst, NL-3015 CE Rotterdam, Netherlands
[5] Univ Med Ctr Utrecht, Ctr Mol Med, Dept Genet, NL-3584 CX Utrecht, Netherlands
[6] Canc Genom Ctr, NL-3584 CG Utrecht, Netherlands
关键词
RNA sequencing; Normalization methods; GeTMM; edgeR; TPM; DESeq2; Colorectal Cancer; BREAST-CANCER; TAMOXIFEN;
D O I
10.1186/s12859-018-2246-7
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter-and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. Results: We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. Conclusions: We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.
引用
收藏
页数:13
相关论文
共 26 条
[1]
Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments [J].
Bullard, James H. ;
Purdom, Elizabeth ;
Hansen, Kasper D. ;
Dudoit, Sandrine .
BMC BIOINFORMATICS, 2010, 11
[3]
Translating tumor biology into personalized treatment planning: analytical performance characteristics of the Oncotype DX® Colon Cancer Assay [J].
Clark-Langone, Kim M. ;
Sangli, Chithra ;
Krishnakumar, Jayadevi ;
Watson, Drew .
BMC CANCER, 2010, 10
[4]
A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [J].
Dillies, Marie-Agnes ;
Rau, Andrea ;
Aubert, Julie ;
Hennequet-Antier, Christelle ;
Jeanmougin, Marine ;
Servant, Nicolas ;
Keime, Celine ;
Marot, Guillemette ;
Castel, David ;
Estelle, Jordi ;
Guernec, Gregory ;
Jagla, Bernd ;
Jouneau, Luc ;
Laloe, Denis ;
Le Gall, Caroline ;
Schaeffer, Brigitte ;
Le Crom, Stephane ;
Guedj, Mickael ;
Jaffrezic, Florence .
BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) :671-683
[5]
STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[6]
The consensus molecular subtypes of colorectal cancer [J].
Guinney, Justin ;
Dienstmann, Rodrigo ;
Wang, Xin ;
de Reynies, Aurelien ;
Schlicker, Andreas ;
Soneson, Charlotte ;
Marisa, Laetitia ;
Roepman, Paul ;
Nyamundanda, Gift ;
Angelino, Paolo ;
Bot, Brian M. ;
Morris, Jeffrey S. ;
Simon, Iris M. ;
Gerster, Sarah ;
Fessler, Evelyn ;
Melo, Felipe De Sousa E. ;
Missiaglia, Edoardo ;
Ramay, Hena ;
Barras, David ;
Homicsko, Krisztian ;
Maru, Dipen ;
Manyam, Ganiraju C. ;
Broom, Bradley ;
Boige, Valerie ;
Perez-Villamil, Beatriz ;
Laderas, Ted ;
Salazar, Ramon ;
Gray, Joe W. ;
Hanahan, Douglas ;
Tabernero, Josep ;
Bernards, Rene ;
Friend, Stephen H. ;
Laurent-Puig, Pierre ;
Medema, Jan Paul ;
Sadanandam, Anguraj ;
Wessels, Lodewyk ;
Delorenzi, Mauro ;
Kopetz, Scott ;
Vermeulen, Louis ;
Tejpar, Sabine .
NATURE MEDICINE, 2015, 21 (11) :1350-1356
[7]
A Systematic Analysis of Oncogenic Gene Fusions in Primary Colon Cancer [J].
Kloosterman, Wigard P. ;
van den Braak, Robert R. J. Coebergh ;
Pieterse, Mark ;
van Roosmalen, Markus J. ;
Sieuwerts, Anieta M. ;
Stangl, Christina ;
Brunekreef, Ronne ;
Lalmahomed, Zarina S. ;
Ooft, Salo ;
van Galen, Anne ;
Smid, Marcel ;
Lefebvre, Armel ;
Zwartkruis, Fried ;
Martens, John W. M. ;
Foekens, John A. ;
Biermann, Katharina ;
Koudijs, Marco J. ;
Ijzermans, Jan N. M. ;
Voest, Emile E. .
CANCER RESEARCH, 2017, 77 (14) :3814-3822
[8]
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J].
Li, Bo ;
Dewey, Colin N. .
BMC BIOINFORMATICS, 2011, 12
[9]
RNA-Seq gene expression estimation with read mapping uncertainty [J].
Li, Bo ;
Ruotti, Victor ;
Stewart, Ron M. ;
Thomson, James A. ;
Dewey, Colin N. .
BIOINFORMATICS, 2010, 26 (04) :493-500
[10]
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data [J].
Li, Peipei ;
Piao, Yongjun ;
Shon, Ho Sun ;
Ryu, Keun Ho .
BMC BIOINFORMATICS, 2015, 16