Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

被引:891
作者
Miotto, Riccardo [1 ,2 ,3 ]
Li, Li [1 ,2 ,3 ]
Kidd, Brian A. [1 ,2 ,3 ]
Dudley, Joel T. [1 ,2 ,3 ]
机构
[1] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[2] Icahn Sch Med Mt Sinai, Harris Ctr Precis Wellness, New York, NY 10029 USA
[3] Icahn Sch Med Mt Sinai, Icahn Inst Genom & Multiscale Biol, New York, NY 10029 USA
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
关键词
RISK PREDICTION; CLASSIFICATION; STRATEGIES; DISORDERS; DIAGNOSIS;
D O I
10.1038/srep26094
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name "deep patient". We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.
引用
收藏
页数:10
相关论文
共 48 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]  
Arnold Corey W, 2010, AMIA Annu Symp Proc, V2010, P26
[3]   Predictive data mining in clinical medicine: Current issues and guidelines [J].
Bellazzi, Riccardo ;
Zupan, Blaz .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2008, 77 (02) :81-97
[4]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[5]   Mining FDA drug labels using an unsupervised learning technique - topic modeling [J].
Bisgin, Halil ;
Liu, Zhichao ;
Fang, Hong ;
Xu, Xiaowei ;
Tong, Weida .
BMC BIOINFORMATICS, 2011, 12
[6]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[7]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[10]   Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies [J].
Cohen, Raphael ;
Elhadad, Michael ;
Elhadad, Noemie .
BMC BIOINFORMATICS, 2013, 14