Learning domain ontologies from document warehouses and dedicated web sites

被引:188
作者
Navigli, R [1 ]
Velardi, P [1 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat, I-00198 Rome, Italy
关键词
D O I
10.1162/089120104323093276
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.
引用
收藏
页码:151 / 179
页数:29
相关论文
共 33 条
[1]  
AGIRRE E, 2000, ECAI ONT LEARN WORKS
[2]  
ALFONSECA E, 2002, LANGUAGE RESOURCES E
[3]   An empirical symbolic approach to natural language processing [J].
Basili, R ;
Pazienza, MT ;
Velardi, P .
ARTIFICIAL INTELLIGENCE, 1996, 85 (1-2) :59-99
[4]  
BASILI R, 1998, P EUR C ART INT ECAI
[5]  
Berland M., 1999, P 37 ANN M ASS COMP
[6]  
Berners-Lee Tim., 1999, WEAVING WEB ORIGINAL
[7]  
Bunke H., 1990, SYNTACTIC STRUCTURAL
[8]  
Church K. W., 1989, ACL 89
[9]  
DAELEMANS W, 1999, ILK9901 TIB U
[10]  
FARQUHAR A, 1998, COLLABORATIVE ONTOLO