Learning domain ontologies from document warehouses and dedicated web sites

被引:188
作者
Navigli, R [1 ]
Velardi, P [1 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat, I-00198 Rome, Italy
关键词
D O I
10.1162/089120104323093276
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method and a tool, OntoLearn, aimed at the extraction of domain ontologies from Web sites, and more generally from documents shared among the members of virtual organizations. OntoLearn first extracts a domain terminology from available documents. Then, complex domain terms are semantically interpreted and arranged in a hierarchical fashion. Finally, a general-purpose ontology, WordNet, is trimmed and enriched with the detected domain concepts. The major novel aspect of this approach is semantic interpretation, that is, the association of a complex concept with a complex term. This involves finding the appropriate WordNet concept for each word of a terminological string and the appropriate conceptual relations that hold among the concept components. Semantic interpretation is based on a new word sense disambiguation algorithm, called structural semantic interconnections.
引用
收藏
页码:151 / 179
页数:29
相关论文
共 33 条
[11]  
FELLBAUM C, 1995, WORDNET ELECT LEXICA
[12]  
FORMICA A, 2003, 2003 INT C WEB SERV
[13]  
GANGEMI A, 2003, WORKSH HUM LANG TECH
[14]  
GANGEMI A, 2001, P EKAW02 SIG SPAIN, P166
[15]   Automatic Labeling of semantic roles [J].
Gildea, D ;
Jurafskyy, D .
COMPUTATIONAL LINGUISTICS, 2002, 28 (03) :245-288
[16]  
Jacq PL, 1997, J FR OPHTALMOL, V20, P97
[17]  
LENAT D, 1993, COMMUNICATIONS ACM, V3
[18]   Ontology learning for the Semantic Web [J].
Maedche, A ;
Staab, S .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 2001, 16 (02) :72-79
[19]  
MAEDCHE A, 2000, P 12 INT C SOFTW ENG
[20]  
Magnini B., 2000, Proceedings of the Second International Conference on Language Resources and Evaluation (LREC'00)