Automated encoding of clinical documents based on natural language processing

被引:289
作者
Friedman, C [1 ]
Shagina, L [1 ]
Lussier, Y [1 ]
Hripcsak, G [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, Coll Phys & Surg, New York, NY 10032 USA
关键词
D O I
10.1197/jamia.M1552
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. Methods: An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MecILEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Results: Recall of the system for UMLS coding of all terms was. 77 (95% CI .72-81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-91), and precision of the experts ranged from .61 to .91. Conclusion: Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
引用
收藏
页码:392 / 402
页数:11
相关论文
共 37 条
[1]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[2]  
Aronson AR, 2000, J AM MED INFORM ASSN, P17
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]  
Berrios DC, 2000, J AM MED INFORM ASSN, P71
[5]  
Berrios DC, 1999, J AM MED INFORM ASSN, P676
[6]   Towards linking patients and clinical information: detecting UMLS concepts in e-mail [J].
Brennan, PF ;
Aronson, AR .
JOURNAL OF BIOMEDICAL INFORMATICS, 2003, 36 (4-5) :334-341
[7]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[8]   KNOWLEDGE-BASED APPROACHES TO THE MAINTENANCE OF A LARGE CONTROLLED MEDICAL TERMINOLOGY [J].
CIMINO, JJ ;
CLAYTON, PD ;
HRIPCSAK, G ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (01) :35-50
[9]   An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text [J].
Cooper, GF ;
Miller, RA .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, 5 (01) :62-75
[10]  
Cote RA, 1993, SYSTEMATISED NOMENCL