Granite: Diversified, Sparse Tensor Factorization for Electronic Health Record-Based Phenotyping

被引:21
作者
Henderson, Jette [1 ]
Ho, Joyce C. [2 ]
Kho, Abel N. [3 ]
Denny, Joshua C. [4 ]
Malin, Bradley A. [4 ]
Sun, Jimeng [5 ]
Ghosh, Joydeep [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Emory Univ, Atlanta, GA 30322 USA
[3] Northwestern Univ, Evanston, IL 60208 USA
[4] Vanderbilt Univ, 221 Kirkland Hall, Nashville, TN 37235 USA
[5] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI) | 2017年
基金
美国国家科学基金会;
关键词
Feature extraction; Data mining; Health information management; Computational phenotyping; Tensor factorization; Electronic health records; HIGH-THROUGHPUT; HYPERTENSION; GENERATION; SYSTEMS;
D O I
10.1109/ICHI.2017.61
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most formidable challenges electronic health records (EHRs) pose for traditional analytics is the inability to map directly (or reliably) to medical concepts or phenotypes. Among other things, EHR-based phenotyping can help identify and target patients for interventions and improve real-time clinical decisions. Existing phenotyping approaches often require labor-intensive supervision from medical experts or do not focus on generating concise and diverse phenotypes. Sparsity in phenotypes is key to making them interpretable and useful to clinicians, while diversity allows clinicians to grasp the main features of a patient population quickly. In this paper, we introduce Granite, a diversified, sparse nonnegative tensor factorization method to derive phenotypes with limited human supervision. Compared to existing highthroughput phenotyping techniques, Granite yields phenotypes with much more distinct (non-overlapping) elements that can, as an artifact, capture rare phenotypes. Moreover, the resulting concise phenotypes retain predictive powers comparable to or surpassing existing dimensionality reduction techniques. We evaluate Granite by comparing its resulting phenotypes with those generated using state-of-the-art, high-throughput methods on simulated as well as real EHR data. Our algorithm offers a promising and novel data-driven solution to rapidly characterize, predict, and manage a wide range of diseases.
引用
收藏
页码:214 / 223
页数:10
相关论文
共 30 条
[1]  
Acar E, 2014, EUR SIGNAL PR CONF, P111
[2]   A scalable optimization approach for fitting canonical tensor decompositions [J].
Acar, Evrim ;
Dunlavy, Daniel M. ;
Kolda, Tamara G. .
JOURNAL OF CHEMOMETRICS, 2011, 25 (02) :67-86
[3]   ANALYSIS OF INDIVIDUAL DIFFERENCES IN MULTIDIMENSIONAL SCALING VIA AN N-WAY GENERALIZATION OF ECKART-YOUNG DECOMPOSITION [J].
CARROLL, JD ;
CHANG, JJ .
PSYCHOMETRIKA, 1970, 35 (03) :283-&
[4]   Deep Computational Phenotyping [J].
Che, Zhengping ;
Kale, David ;
Li, Wenzhe ;
Bahadori, Mohammad Taha ;
Liu, Yan .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :507-516
[5]   Building bridges across electronic health record systems through inferred phenotypic topics [J].
Chen, You ;
Ghosh, Joydeep ;
Bejan, Cosmin Adrian ;
Gunter, Carl A. ;
Gupta, Siddharth ;
Kho, Abel ;
Liebovitz, David ;
Sun, Jimeng ;
Denny, Joshua ;
Malim, Bradley .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 :82-93
[6]   ON TENSORS, SPARSITY, AND NONNEGATIVE FACTORIZATIONS [J].
Chi, Eric C. ;
Kolda, Tamara G. .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2012, 33 (04) :1272-1299
[7]  
Duchi J., 2008, P 25 INT C MACH LEAR, P272, DOI DOI 10.1145/1390156.1390191
[8]   Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations [J].
Hansen, Samantha ;
Plantenga, Todd ;
Kolda, Tamara G. .
OPTIMIZATION METHODS & SOFTWARE, 2015, 30 (05) :1002-1029
[9]  
Harshman R. A., 1970, UCLA WORKING PAPERS, V16, P1, DOI DOI 10.1134/S0036023613040165
[10]  
Henao R., 2015, J MACH LEARN RES, V17, P1