Combining structured and unstructured data for predictive models: a deep learning approach

被引:141
作者
Zhang, Dongdong [1 ,2 ]
Yin, Changchang [3 ]
Zeng, Jucheng [1 ,2 ]
Yuan, Xiaohui [2 ]
Zhang, Ping [1 ,3 ]
机构
[1] Ohio State Univ, Dept Biomed Informat, 1800 Cannon Dr, Columbus, OH 43210 USA
[2] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Hubei, Peoples R China
[3] Ohio State Univ, Dept Comp Sci & Engn, 2015 Neil Ave, Columbus, OH 43210 USA
关键词
Electronic health records; Deep learning; Data fusion; Time series forecasting; HOSPITAL MORTALITY; READMISSION;
D O I
10.1186/s12911-020-01297-6
中图分类号
R-058 [];
学科分类号
摘要
Background The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models. Methods In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions. Results We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only. Conclusions The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors.
引用
收藏
页数:11
相关论文
共 33 条
[1]   Effectiveness of SAPS III to predict hospital mortality for post-cardiac arrest patients [J].
Bisbal, Magali ;
Jouve, Elisabeth ;
Papazian, Laurent ;
de Bourmont, Sophie ;
Perrin, Gilles ;
Eon, Beatrice ;
Gainnier, Marc .
RESUSCITATION, 2014, 85 (07) :939-944
[2]  
Boag Willie, 2018, AMIA Jt Summits Transl Sci Proc, V2017, P26
[3]   Predicting death and readmission after intensive care discharge [J].
Campbell, A. J. ;
Cook, J. A. ;
Adey, G. ;
Cuthbertson, B. H. .
BRITISH JOURNAL OF ANAESTHESIA, 2008, 100 (05) :656-662
[4]   Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission [J].
Caruana, Rich ;
Lou, Yin ;
Gehrke, Johannes ;
Koch, Paul ;
Sturm, Marc ;
Elhadad, Noemie .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1721-1730
[5]  
Collobert Ronan, 2008, ICML, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
[6]  
Deng L, 2013, IEEE INT NEW CIRC
[7]   Potentially Avoidable 30-Day Hospital Readmissions in Medical Patients Derivation and Validation of a Prediction Model [J].
Donze, Jacques ;
Aujesky, Drahomir ;
Williams, Deborah ;
Schnipper, Jeffrey L. .
JAMA INTERNAL MEDICINE, 2013, 173 (08) :632-638
[8]   A comparison of models for predicting early hospital readmissions [J].
Futoma, Joseph ;
Morris, Jonathan ;
Lucas, Joseph .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 :229-238
[9]  
Ghassemi M., 2012, ICML machine learning for clinical data analysis workshop, P1
[10]  
Grnarova P, 2016, ARXIV161200467