Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach

被引:131
作者
Ananthakrishnan, Ashwin N. [1 ,2 ]
Cai, Tianxi [3 ]
Savova, Guergana [4 ]
Cheng, Su-Chun [2 ]
Chen, Pei [4 ]
Perez, Raul Guzman [5 ]
Gainer, Vivian S. [5 ]
Murphy, Shawn N. [5 ,6 ]
Szolovits, Peter [7 ]
Xia, Zongqi [2 ,8 ]
Shaw, Stanley [2 ,9 ]
Churchill, Susanne [10 ]
Karlson, Elizabeth W. [2 ,11 ]
Kohane, Isaac [2 ,4 ,10 ]
Plenge, Robert M. [2 ,11 ]
Liao, Katherine P. [2 ,11 ]
机构
[1] Massachusetts Gen Hosp, Gastrointestinal Unit, Boston, MA 02114 USA
[2] Harvard Univ, Sch Med, Boston, MA USA
[3] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[4] Childrens Hosp, Boston, MA 02115 USA
[5] Partners HealthCare, Res Comp, Charlestown, MA USA
[6] Massachusetts Gen Hosp, Dept Neurol, Boston, MA 02114 USA
[7] MIT, Cambridge, MA 02139 USA
[8] Brigham & Womens Hosp, Dept Neurol, Boston, MA 02115 USA
[9] Massachusetts Gen Hosp, Div Cardiol, Boston, MA 02114 USA
[10] Brigham & Womens Hosp, Natl Ctr Biomed Comp i2b2, Boston, MA 02115 USA
[11] Brigham & Womens Hosp, Div Rheumatol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
Crohn's disease; ulcerative colitis; disease cohort; natural language processing; informatics; RHEUMATOID-ARTHRITIS; HEALTH RECORDS; CLINICAL NARRATIVES; EXTRACTION SYSTEM; OLMSTED COUNTY; PREVALENCE; DISCOVERY; DIAGNOSES; MINNESOTA; SURVIVAL;
D O I
10.1097/MIB.0b013e31828133fd
中图分类号
R57 [消化系及腹部疾病];
学科分类号
100201 [内科学];
摘要
Background:Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record-based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing.Methods:Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's disease (CD) and ulcerative colitis (UC) comprising patients with 1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.Results:We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy.Conclusions:Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.
引用
收藏
页码:1411 / 1420
页数:10
相关论文
共 34 条
[1]
A Nationwide Analysis of Changes in Severity and Outcomes of Inflammatory Bowel Disease Hospitalizations [J].
Ananthakrishnan, Ashwin N. ;
McGinley, Emily L. ;
Binion, David G. ;
Saeian, Kia .
JOURNAL OF GASTROINTESTINAL SURGERY, 2011, 15 (02) :267-276
[2]
[Anonymous], 2004, STAT EVALUATION MED
[3]
[Anonymous], 1993, An introduction to the bootstrap
[4]
[Anonymous], 2001, SPRINGE SER STAT N
[5]
Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: evidence from health administrative data [J].
Benchimol, E. I. ;
Guttmann, A. ;
Griffiths, A. M. ;
Rabeneck, L. ;
Mack, D. R. ;
Brill, H. ;
Howard, J. ;
Guan, J. ;
To, T. .
GUT, 2009, 58 (11) :1490-1497
[6]
Bernstein CN, 1999, AM J EPIDEMIOL, V149, P916, DOI 10.1093/oxfordjournals.aje.a009735
[7]
Hospitalization, surgery, and readmission rates of IBD in Canada: A population-based study [J].
Bernstein, CN ;
Nabalamba, A .
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2006, 101 (01) :110-118
[8]
Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[9]
Variants Near FOXE1 Are Associated with Hypothyroidism and Other Thyroid Conditions: Using Electronic Medical Records for Genome- and Phenome-wide Studies [J].
Denny, Joshua C. ;
Crawford, Dana C. ;
Ritchie, Marylyn D. ;
Bielinski, Suzette J. ;
Basford, Melissa A. ;
Bradford, Yuki ;
Chai, High Seng ;
Bastarache, Lisa ;
Zuvich, Rebecca ;
Peissig, Peggy ;
Carrell, David ;
Ramirez, Andrea H. ;
Pathak, Jyotishman ;
Wilke, Russell A. ;
Rasmussen, Luke ;
Wang, Xiaoming ;
Pacheco, Jennifer A. ;
Kho, Abel N. ;
Hayes, M. Geoffrey ;
Weston, Noah ;
Matsumoto, Martha ;
Kopp, Peter A. ;
Newton, Katherine M. ;
Jarvik, Gail P. ;
Li, Rongling ;
Manolio, Teri A. ;
Kullo, Iftikhar J. ;
Chute, Christopher G. ;
Chisholm, Rex L. ;
Larson, Eric B. ;
McCarty, Catherine A. ;
Masys, Daniel R. ;
Roden, Dan M. ;
de Andrade, Mariza .
AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 89 (04) :529-542
[10]
Assessment of the diagnoses of Crohn's disease and ulcerative colitis in a Danish hospital information system [J].
Fonager, K ;
Sorensen, HT ;
Rasmussen, SN ;
MollerPetersen, J ;
Vyberg, M .
SCANDINAVIAN JOURNAL OF GASTROENTEROLOGY, 1996, 31 (02) :154-159