Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance

被引:127
作者
Wei, Wei-Qi [1 ]
Teixeira, Pedro L. [1 ]
Mo, Huan [1 ]
Cronin, Robert M. [1 ,2 ]
Warner, Jeremy L. [1 ,2 ]
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, 2525 West End Ave 672, Nashville, TN 37203 USA
[2] Vanderbilt Univ, Dept Med, Nashville, TN 37203 USA
关键词
diagnosis codes; electronic health records; clinical notes; International Classification of Diseases; problem lists; medications; phenotype; WORD SENSE DISAMBIGUATION; IDENTIFY PATIENTS; ADMINISTRATIVE DATA; HOSPITALIZATION DATA; VALIDATION; ALGORITHM; ACCURACY; DISEASE; VALIDITY; SYSTEMS;
D O I
10.1093/jamia/ocv130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications. Materials and Methods We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimer's disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinson's disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, and F-score for each EHR component alone and in combination. Results The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean +/- standard deviation: 0.91 +/- 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 +/- 0.16). Overall, the best performance (F score: 0.70 +/- 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 +/- 0.14) was only slightly lower than using two or more components, its PPV (0.71 +/- 0.13) is substantially worse (0.91 +/- 0.08). Conclusion Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
引用
收藏
页码:E20 / E27
页数:8
相关论文
共 47 条
[1]   Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering [J].
Andreopoulos, Bill ;
Alexopoulou, Dimitra ;
Schroeder, Michael .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2008, 2 (03) :193-215
[2]  
[Anonymous], 2014, QUICK MED REFERENCE
[3]   DXPLAIN - AN EVOLVING DIAGNOSTIC DECISION-SUPPORT SYSTEM [J].
BARNETT, GO ;
CIMINO, JJ ;
HUPP, JA ;
HOFFER, EP .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1987, 258 (01) :67-74
[4]   HEALTH CARE REFORM Patients Treated at Multiple Acute Health Care Facilities Quantifying Information Fragmentation [J].
Bourgeois, Fabienne C. ;
Olson, Karen L. ;
Mandl, Kenneth D. .
ARCHIVES OF INTERNAL MEDICINE, 2010, 170 (22) :1989-1995
[5]   Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[6]   Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm [J].
Chapman, Brian E. ;
Lee, Sean ;
Kang, Hyunseok Peter ;
Chapman, Wendy W. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) :728-737
[7]   The validity of using ICD-9 codes and pharmacy records to identify patients with chronic obstructive pulmonary disease [J].
Cooke, Colin R. ;
Joo, Min J. ;
Anderson, Stephen M. ;
Lee, Todd A. ;
Udris, Edmunds M. ;
Johnson, Eric ;
Au, David H. .
BMC HEALTH SERVICES RESEARCH, 2011, 11
[8]   Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data [J].
Denny, Joshua C. ;
Bastarache, Lisa ;
Ritchie, Marylyn D. ;
Carroll, Robert J. ;
Zink, Raquel ;
Mosley, Jonathan D. ;
Field, Julie R. ;
Pulley, Jill M. ;
Ramirez, Andrea H. ;
Bowton, Erica ;
Basford, Melissa A. ;
Carrell, David S. ;
Peissig, Peggy L. ;
Kho, Abel N. ;
Pacheco, Jennifer A. ;
Rasmussen, Luke V. ;
Crosslin, David R. ;
Crane, Paul K. ;
Pathak, Jyotishman ;
Bielinski, Suzette J. ;
Pendergrass, Sarah A. ;
Xu, Hua ;
Hindorff, Lucia A. ;
Li, Rongling ;
Manolio, Teri A. ;
Chute, Christopher G. ;
Chisholm, Rex L. ;
Larson, Eric B. ;
Jarvik, Gail P. ;
Brilliant, Murray H. ;
McCarty, Catherine A. ;
Kullo, Iftikhar J. ;
Haines, Jonathan L. ;
Crawford, Dana C. ;
Masys, Daniel R. ;
Roden, Dan M. .
NATURE BIOTECHNOLOGY, 2013, 31 (12) :1102-+
[9]   Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents [J].
Denny, Joshua C. ;
Spickard, Anderson, III ;
Johnson, Kevin B. ;
Peterson, Neeraja B. ;
Peterson, Josh F. ;
Miller, Randolph A. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (06) :806-815
[10]   Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records [J].
Dumitrescu, Logan ;
Ritchie, Marylyn D. ;
Brown-Gentry, Kristin ;
Pulley, Jill M. ;
Basford, Melissa ;
Denny, Joshua C. ;
Oksenberg, Jorge R. ;
Roden, Dan M. ;
Haines, Jonathan L. ;
Crawford, Dana C. .
GENETICS IN MEDICINE, 2010, 12 (10) :648-650