Limestone: High-throughput candidate phenotype generation via tensor factorization

被引:101
作者
Ho, Joyce C. [1 ]
Ghosh, Joydeep [1 ]
Steinhubl, Steve R. [2 ]
Stewart, Walter F. [3 ]
Denny, Joshua C. [4 ,5 ]
Malin, Bradley A. [4 ,6 ]
Sun, Jimeng [7 ]
机构
[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[2] Scripps Hlth, Scripps Translat Sci Inst, La Jolla, CA 92037 USA
[3] Sutter Hlth, Sutter Hlth Res Dev & Disseminat Team, Walnut Creek, CA 94598 USA
[4] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN 37232 USA
[5] Vanderbilt Univ, Dept Med, Nashville, TN 37232 USA
[6] Vanderbilt Univ, Dept Elect Engn & Comp Sci, Nashville, TN 37232 USA
[7] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Dimensionality reduction; Nonnegative tensor factorization; EHR phenotyping; ELECTRONIC HEALTH RECORDS; NONNEGATIVE MATRIX FACTORIZATION; HEART-FAILURE; MEDICAL-RECORDS; EMERGE NETWORK; CARE; ALGORITHMS; PREDICTION; DISEASE; DECOMPOSITIONS;
D O I
10.1016/j.jbi.2014.07.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately. EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. Furthermore, existing approaches are often disease-centric and specialized to the idiosyncrasies of the information technology and/or business practices of a single healthcare organization. In this paper, we propose Limestone, a nonnegative tensor factorization method to derive phenotype candidates with virtually no human supervision. Limestone represents the data source interactions naturally using tensors (a generalization of matrices). In particular, we investigate the interaction of diagnoses and medications among patients. The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and medications. Using the proposed method, multiple phenotypes can be identified simultaneously from data. We demonstrate the capability of Limestone on a cohort of 31,815 patient records from the Geisinger Health System. The dataset spans 7 years of longitudinal patient records and was initially constructed for a heart failure onset prediction study. Our experiments demonstrate the robustness, stability, and the conciseness of Limestone-derived phenotypes. Our results show that using only 40 phenotypes, we can outperform the original 640 features (169 diagnosis categories and 471 medication types) to achieve an area under the receiver operator characteristic curve (AUC) of 0.720 (95% CI 0.715 to 0.725). Moreover, in consultation with a medical expert, we confirmed 82% of the top 50 candidates automatically extracted by Limestone are clinically meaningful. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:199 / 211
页数:13
相关论文
共 80 条
[31]   Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium [J].
Kho, Abel N. ;
Pacheco, Jennifer A. ;
Peissig, Peggy L. ;
Rasmussen, Luke ;
Newton, Katherine M. ;
Weston, Noah ;
Crane, Paul K. ;
Pathak, Jyotishman ;
Chute, Christopher G. ;
Bielinski, Suzette J. ;
Kullo, Iftikhar J. ;
Li, Rongling ;
Manolio, Teri A. ;
Chisholm, Rex L. ;
Denny, Joshua C. .
SCIENCE TRANSLATIONAL MEDICINE, 2011, 3 (79)
[32]  
Koh Hian Chye, 2005, J Healthc Inf Manag, V19, P64
[33]   Tensor Decompositions and Applications [J].
Kolda, Tamara G. ;
Bader, Brett W. .
SIAM REVIEW, 2009, 51 (03) :455-500
[34]   Selected techniques for data mining in medicine [J].
Lavrac, N .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 1999, 16 (01) :3-23
[35]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791
[36]   Nonnegative tensor factorization for continuous EEG classification [J].
Lee, Hyekyoung ;
Kim, Yong-Deok ;
Cichocki, Andrzej ;
Choi, Seungjin .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2007, 17 (04) :305-317
[37]  
Lee H, 2006, LECT NOTES COMPUT SC, V4132, P250
[38]  
Lee N., 2011, 2011 IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, P250, DOI 10.1109/HISB.2011.34
[39]   The Seattle heart failure model - Prediction of survival in heart failure [J].
Levy, WC ;
Mozaffarian, D ;
Linker, DT ;
Sutradhar, SC ;
Anker, SD ;
Cropp, AB ;
Anand, I ;
Maggioni, A ;
Burton, P ;
Sullivan, MD ;
Pitt, B ;
Poole-Wilson, PA ;
Mann, DL ;
Packer, M .
CIRCULATION, 2006, 113 (11) :1424-1433
[40]  
Li Dingcheng, 2012, AMIA Annu Symp Proc, V2012, P532