Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis

被引:44
作者
Abhyankar, Swapna [1 ]
Demner-Fushman, Dina [1 ]
Callaghan, Fiona M. [1 ]
McDonald, Clement J. [1 ]
机构
[1] NIH, Natl Lib Med, Lister Hill Natl Ctr Biomed Commun, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
IDENTIFICATION; ACCURACY; RECORDS; CODES; TEXT;
D O I
10.1136/amiajnl-2013-001915
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To develop a generalizable method for identifying patient cohorts from electronic health record (EHR) data in this case, patients having dialysis that uses simple information retrieval (IR) tools. Methods We used the coded data and clinical notes from the 24 506 adult patients in the Multiparameter Intelligent Monitoring in Intensive Care database to identify patients who had dialysis. We used SQL queries to search the procedure, diagnosis, and coded nursing observations tables based on ICD-9 and local codes. We used a domain-specific search engine to find clinical notes containing terms related to dialysis. We manually validated the available records for a 10% random sample of patients who potentially had dialysis and a random sample of 200 patients who were not identified as having dialysis based on any of the sources. Results We identified 1844 patients that potentially had dialysis: 1481 from the three coded sources and 1624 from the clinical notes. Precision for identifying dialysis patients based on available data was estimated to be 78.4% (95% Cl 71.9% to 84.2%) and recall was 100% (95% Cl 86% to 100%). Conclusions Combining structured EHR data with information from clinical notes using simple queries increases the utility of both types of data for cohort identification. Patients identified by more than one source are more likely to meet the inclusion criteria; however, including patients found in any of the sources increases recall. This method is attractive because it is available to researchers with access to EHR data and off-the-shelf IR tools.
引用
收藏
页码:801 / 807
页数:7
相关论文
共 28 条
[1]   Lower short- and long-term mortality associated with overweight and obesity in a large cohort study of adult intensive care unit patients [J].
Abhyankar, Swapna ;
Leishear, Kira ;
Callaghan, Fiona M. ;
Demner-Fushman, Dina ;
McDonald, Clement J. .
CRITICAL CARE, 2012, 16 (06)
[2]  
Callaghan FM, 2012, 5 INT S SEM MIN BIOM
[3]  
Carroll J., 2006, MEASUREMENT ERROR NO, V2nd edn, DOI [10.1201/9781420010138, DOI 10.1201/9781420010138]
[4]   Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[5]   Tradeoffs between accuracy measures for electronic health care data algorithms [J].
Chubak, Jessica ;
Pocobelli, Gaia ;
Weiss, Noel S. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2012, 65 (03) :343-349
[6]  
Demner-Fushman D, 2012, P DAT INT LIF SCI 8, P100
[7]   Chapter 13: Mining Electronic Health Records in the Genomics Era [J].
Denny, Joshua C. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (12)
[8]   Utility of administrative claims data for the study of brain metastases: a validation study [J].
Eichler, April F. ;
Lamont, Elizabeth B. .
JOURNAL OF NEURO-ONCOLOGY, 2009, 95 (03) :427-431
[9]   Use of Administrative Data to Estimate the Incidence of Statin-Related Rhabdomyolysis [J].
Floyd, James S. ;
Heckbert, Susan R. ;
Weiss, Noel S. ;
Carrell, David S. ;
Psaty, Bruce M. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2012, 307 (15) :1580-1582
[10]   Automated encoding of clinical documents based on natural language processing [J].
Friedman, C ;
Shagina, L ;
Lussier, Y ;
Hripcsak, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (05) :392-402