Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets

被引:648
作者
Argelaguet, Ricard [1 ]
Velten, Britta [2 ]
Arnol, Damien [1 ]
Dietrich, Sascha [3 ]
Zenz, Thorsten [3 ,4 ,5 ,6 ,7 ]
Marioni, John C. [1 ,8 ,9 ]
Buettner, Florian [1 ,10 ]
Huber, Wolfgang [2 ]
Stegle, Oliver [1 ,2 ]
机构
[1] European Mol Biol Lab, European Bioinformat Inst, Cambridge, England
[2] European Mol Biol Lab, Heidelberg, Germany
[3] Heidelberg Univ Hosp, Heidelberg, Germany
[4] German Canc Res Ctr, Heidelberg, Germany
[5] Natl Ctr Tumor Dis NCT, Heidelberg, Germany
[6] Univ Hosp Zurich, Germany & Hematol, Zurich, Switzerland
[7] Univ Zurich, Zurich, Switzerland
[8] Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, England
[9] Wellcome Trust Sanger Inst, Cambridge, England
[10] German Res Ctr Environm Hlth, Helmholtz Zentrum Munchen, Inst Computat Biol, Neuherberg, Germany
关键词
data integration; dimensionality reduction; multi-omics; personalized medicine; single-cell omics; CHRONIC LYMPHOCYTIC-LEUKEMIA; STATISTICAL FRAMEWORK; DNA METHYLATION; DISCOVERY; DISEASE; IDENTIFICATION; HETEROGENEITY; SUBGROUPS; LANDSCAPE; REDUCTION;
D O I
10.15252/msb.20178124
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and exvivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
引用
收藏
页数:13
相关论文
共 69 条
[1]   An Integrated Approach to Uncover Drivers of Cancer [J].
Akavia, Uri David ;
Litvin, Oren ;
Kim, Jessica ;
Sanchez-Garcia, Felix ;
Kotliar, Dylan ;
Causton, Helen C. ;
Pochanard, Panisa ;
Mozes, Eyal ;
Garraway, Levi A. ;
Pe'er, Dana .
CELL, 2010, 143 (06) :1005-1017
[2]   Heat shock factors: integrators of cell stress, development and lifespan [J].
Akerfelt, Malin ;
Morimoto, Richard I. ;
Sistonen, Lea .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2010, 11 (08) :545-555
[3]   Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity [J].
Angermueller, Christof ;
Clark, Stephen J. ;
Lee, Heather J. ;
Macaulay, Iain C. ;
Teng, Mabel J. ;
Hu, Tim Xiaoming ;
Krueger, Felix ;
Smallwood, Sebastien A. ;
Ponting, Chris P. ;
Voet, Thierry ;
Kelsey, Gavin ;
Stegle, Oliver ;
Reik, Wolf .
NATURE METHODS, 2016, 13 (03) :229-+
[4]  
[Anonymous], 2012, ARTIF INTELL
[5]  
[Anonymous], 2016, bioRxiv, DOI [10.1101/067611, DOI 10.1101/067611]
[6]   The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans [J].
Ardlie, Kristin G. ;
DeLuca, David S. ;
Segre, Ayellet V. ;
Sullivan, Timothy J. ;
Young, Taylor R. ;
Gelfand, Ellen T. ;
Trowbridge, Casandra A. ;
Maller, Julian B. ;
Tukiainen, Taru ;
Lek, Monkol ;
Ward, Lucas D. ;
Kheradpour, Pouya ;
Iriarte, Benjamin ;
Meng, Yan ;
Palmer, Cameron D. ;
Esko, Tonu ;
Winckler, Wendy ;
Hirschhorn, Joel N. ;
Kellis, Manolis ;
MacArthur, Daniel G. ;
Getz, Gad ;
Shabalin, Andrey A. ;
Li, Gen ;
Zhou, Yi-Hui ;
Nobel, Andrew B. ;
Rusyn, Ivan ;
Wright, Fred A. ;
Lappalainen, Tuuli ;
Ferreira, Pedro G. ;
Ongen, Halit ;
Rivas, Manuel A. ;
Battle, Alexis ;
Mostafavi, Sara ;
Monlong, Jean ;
Sammeth, Michael ;
Mele, Marta ;
Reverter, Ferran ;
Goldmann, Jakob M. ;
Koller, Daphne ;
Guigo, Roderic ;
McCarthy, Mark I. ;
Dermitzakis, Emmanouil T. ;
Gamazon, Eric R. ;
Im, Hae Kyung ;
Konkashbaev, Anuar ;
Nicolae, Dan L. ;
Cox, Nancy J. ;
Flutre, Timothee ;
Wen, Xiaoquan ;
Stephens, Matthew .
SCIENCE, 2015, 348 (6235) :648-660
[7]   Ontogeny of CpG island methylation and specificity of DNMT3 methyltransferases during embryonic development in the mouse [J].
Auclair, Ghislain ;
Guibert, Sylvain ;
Bender, Ambre ;
Weber, Michael .
GENOME BIOLOGY, 2014, 15 (12) :545
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[10]   f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq [J].
Buettner, Florian ;
Pratanwanich, Naruemon ;
McCarthy, Davis J. ;
Marioni, John C. ;
Stegle, Oliver .
GENOME BIOLOGY, 2017, 18