Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms

被引:36
作者
Al-Haddad, Mohammad A. [1 ]
Friedlin, Jeff [2 ,4 ]
Kesterson, Joe [4 ]
Waters, Joshua A. [3 ]
Aguilar-Saavedra, Juan R. [3 ]
Schmidt, C. Max [3 ]
机构
[1] Indiana Univ Sch Med, Dept Med, Indianapolis, IN USA
[2] Indiana Univ Sch Med, Dept Family Med, Indianapolis, IN USA
[3] Indiana Univ Sch Med, Dept Surg, Indianapolis, IN USA
[4] Regenstrief Inst Hlth Care, Indianapolis, IN 46202 USA
关键词
Intraductal papillary mucinous neoplasm; pancreatic cancer; prevention; cystic neoplasm; precancerous; natural language processing; data mining; DATABASES; PANCREAS;
D O I
10.1111/j.1477-2574.2010.00235.x
中图分类号
R57 [消化系及腹部疾病];
学科分类号
100201 [内科学];
摘要
Background: Medical natural language processing (NLP) systems have been developed to identify, extract and encode information within clinical narrative text. However, the role of NLP in clinical research and patient care remains limited. Pancreatic cysts are common. Some pancreatic cysts, such as intraductal papillary mucinous neoplasms (IPMNs), have malignant potential and require extended periods of surveillance. We seek to develop a novel NLP system that could be applied in our clinical network to develop a functional registry of IPMN patients. Objectives: This study aims to validate the accuracy of our novel NLP system in the identification of surgical patients with pathologically confirmed IPMN in comparison with our pre-existing manually created surgical database (standard reference). Methods: The Regenstrief EXtraction Tool (REX) was used to extract pancreatic cyst patient data from medical text files from Indiana University Health. The system was assessed periodically by direct sampling and review of medical records. Results were compared with the standard reference. Results: Natural language processing detected 5694 unique patients with pancreas cysts, in 215 of whom surgical pathology had confirmed IPMN. The NLP software identified all but seven patients present in the surgical database and identified an additional 37 IPMN patients not previously included in the surgical database. Using the standard reference, the sensitivity of the NLP program was 97.5% (95% confidence interval [CI] 94.8-98.9%) and its positive predictive value was 95.5% (95% CI 92.3-97.5%). Conclusions: Natural language processing is a reliable and accurate method for identifying selected patient cohorts and may facilitate the identification and follow-up of patients with IPMN.
引用
收藏
页码:688 / 695
页数:8
相关论文
共 19 条
[1]
Infopoints - Improving the use of clinical databases [J].
Black, N ;
Payne, M .
BRITISH MEDICAL JOURNAL, 2002, 324 (7347) :1194-1194
[2]
Friedlin Jeff, 2006, AMIA Annu Symp Proc, P269
[3]
Friedlin Jeff, 2008, AMIA Annu Symp Proc, P207
[4]
Friedlin Jeff, 2006, AMIA Annu Symp Proc, P925
[5]
A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[6]
Using clinical databases to evaluate healthcare interventions [J].
Harvey, Sheila ;
Rowan, Kathy ;
Harrison, David ;
Black, Nick .
INTERNATIONAL JOURNAL OF TECHNOLOGY ASSESSMENT IN HEALTH CARE, 2010, 26 (01) :86-94
[7]
The inaccuracy of ICD-9-CM Code 530.2 for identifying patients with Barrett's esophagus [J].
Jacobson, B. C. ;
Gerson, L. B. .
DISEASES OF THE ESOPHAGUS, 2008, 21 (05) :452-456
[8]
Jain N L, 1996, Proc AMIA Annu Fall Symp, P542
[9]
Jain NL, 1997, J AM MED INFORM ASSN, P829
[10]
Kimura W, 1995, INT J PANCREATOL, V18, P197