Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method

被引:45
作者
Liu, HF [1 ]
Lussier, YA
Friedman, C
机构
[1] CUNY Grad Sch & Univ Ctr, Div Comp Sci, New York, NY 10016 USA
[2] CUNY Queens Coll, Dept Comp Sci, New York, NY USA
[3] Columbia Univ Coll Phys & Surg, Dept Med Informat, New York, NY 10032 USA
关键词
natural language processing; word sense disambiguation; corpus-based machine learning; MedLEE; UMLS; MEDLINE;
D O I
10.1006/jbin.2001.1023
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term. (C) 2001 Elsevier Science (USA).
引用
收藏
页码:249 / 261
页数:13
相关论文
共 39 条
[1]  
Aronson A. R., 1994, P RIAO, V1, P197
[2]  
Bloom DA, 2000, BJU INT, V86, P1
[3]  
BRUCE R, P ACL, V32, P139
[4]  
BRUCE R, 1994, P ACL, V32, P139
[5]  
Campbell DA, 2001, J AM MED INFORM ASSN, P90
[6]  
CARDIE C, P NAT C AI, V11, P798
[7]   Acronyms of clinical trials in cardiology - 1998 [J].
Cheng, TO .
AMERICAN HEART JOURNAL, 1999, 137 (04) :726-765
[8]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[9]  
Dreiseitl S, 2001, J BIOMED INFORM, V34, P28, DOI 10.1006/jbin.2001.10004
[10]  
Escudero G, 2000, FR ART INT, V54, P421