An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text

被引:35
作者
Cooper, GF
Miller, RA
机构
[1] Univ Pittsburgh, Ctr Biomed Informat, Pittsburgh, PA 15213 USA
[2] Vanderbilt Univ, Div Biomed Informat, Nashville, TN USA
关键词
D O I
10.1136/jamia.1998.0050062
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: A primary goal of the University of Pittsburgh's 1930-94 UMLS-sponsored effort was to develop and evaluate PostDoc (a lexical indexing system) and Finder (a statistical indexing system) comparatively, and then in combination as a hybrid system. Each system takes as input a portion of the free text from a narrative part of a patient's electronic medical record and returns a list of suggested MeSH terms to use in formulating a Medline search that includes concepts in the text. This paper describes the systems and reports an evaluation. The intent is for this evaluation to serve as a step toward the eventual realization of systems that assist healthcare personnel in using the electronic medical record to construct patient-specific searches of Medline. Design: The authors tested the performances of PostDoc, Finder, and a hybrid system, using text taken from randomly selected clinical records, which were stratified to include six radiology reports, six pathology reports, and six discharge summaries. They identified concepts in the clinical records that might conceivably be used in performing a patient-specific Medline search. Each system was given the free text of each record as an input. The extent to which a system-derived list of MeSH terms captured the relevant concepts in these documents was determined based on blinded assessments by the authors. Results: PostDoc output a mean of approximately 19 MeSH terms per report, which included about 40% of the relevant report concepts. Finder output a mean of approximately 57 terms per report and captured about 45% of the relevant report concepts. A hybrid system captured approximately 66% of the relevant concepts and output about 71 terms per report. Conclusion: The outputs of PostDoc and Finder are complementary in capturing MeSH terms from clinical free text. The results suggest possible approaches to reduce the number of terms output while maintaining the percentage of terms captured, including the use of UMLS semantic types to constrain the output list to contain only clinically relevant MeSH terms.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 22 条
[1]  
ALIFERIS CF, 1995, METHOD INFORM MED, V34, P5
[2]  
Cimino J J, 1992, Proc Annu Symp Comput Appl Med Care, P81
[3]  
Evans D. A., 1988, Proceedings. The Twelfth Annual Symposium on Computer Applications in Medical Care (IEEE Cat. No.88CH2616-1), P169
[4]  
Evans D A, 1996, Proc AMIA Annu Fall Symp, P388
[5]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[6]  
HAUG P, 1994, J AM MED INFORM ASSN, P247
[7]   EVALUATION OF META-1 FOR A CONCEPT-BASED APPROACH TO THE AUTOMATED INDEXING AND RETRIEVAL OF BIBLIOGRAPHIC AND FULL-TEXT DATABASES [J].
HERSH, WR .
MEDICAL DECISION MAKING, 1991, 11 (04) :S120-S124
[8]  
KANTER SL, 1994, B MED LIBR ASSOC, V82, P283
[9]   THE UNIFIED MEDICAL LANGUAGE SYSTEM [J].
LINDBERG, DAB ;
HUMPHREYS, BL ;
MCCRAY, AT .
METHODS OF INFORMATION IN MEDICINE, 1993, 32 (04) :281-291
[10]  
LOWE HJ, 1995, B MED LIBR ASSOC, V83, P57