Using automatically learnt verb selectional preferences for classification of biomedical terms

被引:4
作者
Spasic, I
Ananiadou, S
机构
[1] Univ Manchester, Dept Chem, Manchester M60 1QD, Lancs, England
[2] Univ Salford, Sch Comp Sci & Engn, Salford M5 4WT, Lancs, England
[3] NaCTeM, Manchester, Lancs, England
关键词
term recognition; term classification; ontologies; machine learning; genetic algorithms; similarity measures; corpus processing;
D O I
10.1016/j.jbi.2004.08.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we present an approach to term classification based on verb selectional patterns (VSPs), where such a pattern is defined as a set of semantic classes that could be used in combination with a given domain-specific verb. VSPs have been automatically learnt based on the information found in a corpus and an ontology in the biomedical domain. Prior to the learning phase, the corpus is terminologically processed: term recognition is performed by both looking up the dictionary of terms listed in the ontology and applying the C/NC-value method for on-the-fly term extraction. Subsequently, domain-specific verbs are automatically identified in the corpus based on the frequency of occurrence and the frequency of their co-occurrence with terms. VSPs are then learnt automatically for these verbs. Two machine learning approaches are presented. The first approach has been implemented as an iterative generalisation procedure based on a partial order relation induced by the domain-specific ontology. The second approach exploits the idea of genetic algorithms. Once the VSPs are acquired, they can be used to classify newly recognised terms co-occurring with domain-specific verbs. Given a term, the most frequently co-occurring domain-specific verb is selected. Its VSP is used to constrain the search space by focusing on potential classes of the given term. A nearest-neighbour approach is then applied to select a class from the constrained space of candidate classes. The most similar candidate class is predicted for the given term. The similarity measure used for this purpose combines contextual, lexical, and syntactic properties of terms. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:483 / 497
页数:15
相关论文
共 33 条
[1]   RiboWeb: An ontology-based system for collaborative molecular biology [J].
Altman, RB ;
Bada, M ;
Chai, XQJ ;
Carillo, MW ;
Chen, RO ;
Abernethy, NF .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (05) :68-76
[2]  
[Anonymous], INT J DIGITAL LIB
[3]   An ontology for bioinformatics applications [J].
Baker, PG ;
Goble, CA ;
Bechhofer, S ;
Paton, NW ;
Stevens, R ;
Brass, A .
BIOINFORMATICS, 1999, 15 (06) :510-520
[4]  
Blaschke C, 2002, IEEE INTELL SYST, V17, P73
[5]  
Blaschke Christian, 2002, Brief Bioinform, V3, P154, DOI 10.1093/bib/3.2.154
[6]   Assessing the consistency of a biomedical terminology through lexical knowledge [J].
Bodenreider, O ;
Burgun, A ;
Rindflesch, TC .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) :85-95
[7]  
COLLIER N, 2001, J TERMINOLOGY, P239
[8]  
Fukuda K, 1998, Pac Symp Biocomput, P707
[9]  
Goldberg DE., 1989, Genetic Algorithm For Search, Optimization and Machine Learning, pp 432
[10]  
GREFENSTETTE G, 1992, P AAAI WORKSH STAT B