The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text

被引:323
作者
Rindflesch, TC [1 ]
Fiszman, M [1 ]
机构
[1] NIH, Lister Hill Natl Ctr Biomed Commun, NIH, Dept Hlth & Human Serv, Bethesda, MD 20894 USA
关键词
natural language processing; semantic processing; knowledge representation; information extraction;
D O I
10.1016/j.jbi.2003.11.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Interpretation of semantic propositions in free-text documents such as MEDLINE citations would provide valuable support for biomedical applications, and several approaches to semantic interpretation are being pursued in the biomedical informatics community. In this paper, we describe a methodology for interpreting linguistic structures that encode hypernymic propositions, in which a more specific concept is in a taxonomic relationship with a more general concept. In order to effectively process these constructions, we exploit underspecified syntactic analysis and structured domain knowledge from the Unified Medical Language System (UMLS). After introducing the syntactic processing on which our system depends, we focus on the UMLS knowledge that supports interpretation of hypernymic propositions. We first use semantic groups from the Semantic Network to ensure that the two concepts involved are compatible; hierarchical information in the Metathesaurus then determines which concept is more general and which more specific. A preliminary evaluation of a sample based on the semantic group Chemicals and Drugs provides 83% precision. An error analysis was conducted and potential solutions to the problems encountered are presented. The research discussed here serves as a paradigm for investigating the interaction between domain knowledge and linguistic structure in natural language processing, and could also make a contribution to research on automatic processing of discourse structure. Additional implications of the system we present include its integration in advanced semantic interpretation processors for biomedical text and its use for information extraction in specific domains. The approach has the potential to support a range of applications, including information retrieval and ontology engineering. Published by Elsevier Inc.
引用
收藏
页码:462 / 477
页数:16
相关论文
共 60 条
[31]  
Hearst MA, 1992, P 14 INT C COMP LING, V2, P539, DOI DOI 10.3115/992133.992154
[32]   A reliability studs for evaluating information extraction from radiology reports [J].
Hripcsak, G ;
Kuperman, GJ ;
Friedman, C ;
Heitjan, DF .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1999, 6 (02) :143-150
[33]   UNLOCKING CLINICAL-DATA FROM NARRATIVE REPORTS - A STUDY OF NATURAL-LANGUAGE PROCESSING [J].
HRIPCSAK, G ;
FRIEDMAN, C ;
ALDERSON, PO ;
DUMOUCHEL, W ;
JOHNSON, SB ;
CLAYTON, PD .
ANNALS OF INTERNAL MEDICINE, 1995, 122 (09) :681-688
[34]  
Humphreys BL, 1998, J AM MED INFORM ASSN, V5, P1
[35]  
Johnson S B, 1993, Proc Annu Symp Comput Appl Med Care, P294
[36]  
KLAVANS JL, 2001, P 1 ACM IEEE CS JOIN, P257
[37]  
Knirsch CA, 1998, INFECT CONT HOSP EP, V19, P94
[38]  
Liu HF, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P464
[39]  
McCray AT, 2001, STUD HEALTH TECHNOL, V84, P216
[40]  
MCCRAY AT, 1994, J AM MED INFORM ASSN, P235