Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach

被引:188
作者
Liang, Muxuan [1 ,2 ]
Li, Zhizhong [3 ]
Chen, Ting [4 ,5 ,6 ]
Zeng, Jianyang [7 ]
机构
[1] Tsinghua Univ, Dept Math Sci, Beijing 100084, Peoples R China
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Novartis Res Fdn, Genom Inst, Drug Discovery Oncol Grp, San Diego, CA 92121 USA
[4] Tsinghua Univ, Bioinformat Div, TNLIST, Beijing 100084, Peoples R China
[5] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[6] Univ So Calif, Program Computat Biol & Bioinformat, Los Angeles, CA 90089 USA
[7] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-platform cancer data analysis; restricted Boltzmann machine; multimodal deep belief network; identification of cancer subtypes; genomic data; clinical data; BREAST; ALGORITHM;
D O I
10.1109/TCBB.2014.2377729
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra-and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personalized cancer therapy.
引用
收藏
页码:928 / 937
页数:10
相关论文
共 30 条
  • [1] [Anonymous], 2004, Proceedings of the Twenty-First International Conference on Machine Learning, DOI [10.1145/1015330.1015408, DOI 10.1145/1015330.1015408]
  • [2] [Anonymous], 2007, P 24 INT C MACHINE L
  • [3] [Anonymous], 2010, Momentum
  • [4] MicroRNA-29c functions as a tumor suppressor by direct targeting oncogenic SIRT1 in hepatocellular carcinoma
    Bae, H. J.
    Noh, J. H.
    Kim, J. K.
    Eun, J. W.
    Jung, K. H.
    Kim, M. G.
    Chang, Y. G.
    Shen, Q.
    Kim, S-J
    Park, W. S.
    Lee, J. Y.
    Nam, S. W.
    [J]. ONCOGENE, 2014, 33 (20) : 2557 - 2567
  • [5] Carreira-Perpinan MiguelA., 2005, 10 INT WORKSHOP ARTI, P59
  • [6] Telomerase Inhibitors as Novel Antitumor Drugs
    Glukhov, A. I.
    Svinareva, L. V.
    Severin, S. E.
    Shvets, V. I.
    [J]. APPLIED BIOCHEMISTRY AND MICROBIOLOGY, 2011, 47 (07) : 655 - 660
  • [7] Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
  • [8] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [9] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [10] Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
    Huang, Da Wei
    Sherman, Brad T.
    Lempicki, Richard A.
    [J]. NATURE PROTOCOLS, 2009, 4 (01) : 44 - 57