Assessment of commercial NLP engines for medication information extraction from dictated clinical notes

被引:42
作者
Jayannathan, V. [1 ]
Mullett, Charles J. [2 ]
Arbogast, James G. [2 ]
Halbritter, Kevin A. [2 ]
Yellaprayada, Deepthi [2 ]
Regulapati, Sushmitha [2 ]
Bandaru, Pavani [2 ]
机构
[1] MedQuist Inc, Morgantown, WV 26505 USA
[2] W Virginia Univ, Morgantown, WV 26506 USA
关键词
Natural language processing (NLP); Medication extraction; Text mining; DOCUMENTS;
D O I
10.1016/j.ijmedinf.2008.08.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose: We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. Methods: Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. Results: Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. Conclusion: Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention. (c) 2008 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:284 / 291
页数:8
相关论文
共 23 条
[1]  
Chapman W. W., 2007, BIONLP 2007 BIOL TRA, P81
[2]  
Chapman WW, 2001, J AM MED INFORM ASSN, P105
[3]  
Cimino JJ, 2007, STUD HEALTH TECHNOL, V129, P679
[4]   Identifying smokers with a medical extraction system [J].
Clark, Cheryl ;
Good, Kathleen ;
Jezierny, Lesley ;
Macpherson, Melissa ;
Wilson, Brian ;
Chajewska, Urszula .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (01) :36-39
[5]  
DENECKE K, 2007, LECT NOTES COMPUTER, P257
[6]   HL7 Clinical Document Architecture, Release 2 [J].
Dolin, RH ;
Alschuler, L ;
Boyer, S ;
Beebe, C ;
Behlen, FM ;
Biron, PV ;
Shabo, A .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2006, 13 (01) :30-39
[7]  
Friedman C, 1997, J AM MED INFORM ASSN, P595
[8]   Automated encoding of clinical documents based on natural language processing [J].
Friedman, C ;
Shagina, L ;
Lussier, Y ;
Hripcsak, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (05) :392-402
[9]   Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research [J].
Gupta, D ;
Saul, M ;
Gilbertson, J .
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2004, 121 (02) :176-186
[10]   MediClass: A system for detecting and classifying encounter-based clinical events in any electronic medical record [J].
Hazlehurst, B ;
Frost, RH ;
Sittig, DF ;
Stevens, VJ .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (05) :517-529