Taxonomic data integration from multilingual Wikipedia editions

被引:10
作者
de Melo, Gerard [1 ]
Weikum, Gerhard [2 ]
机构
[1] ICSI Berkeley, Berkeley, CA 94704 USA
[2] Max Planck Inst Informat, Databases & Informat Syst Dept, D-66123 Saarbrucken, Germany
关键词
Taxonomy induction; Multilingual; Graph; Ranking; PROPORTION;
D O I
10.1007/s10115-012-0597-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available.
引用
收藏
页码:1 / 39
页数:39
相关论文
共 86 条
[1]  
ADAR E, 2009, P WSDM 2009
[2]  
Agirre E., 2010, P 5 INT WORKSH SEM E, P75
[3]  
Aho A. V., 1972, SIAM Journal on Computing, V1, P131, DOI 10.1137/0201008
[4]  
[Anonymous], 1992, COLING 1992, DOI DOI 10.3115/992133.992154
[5]  
[Anonymous], 2004, ADV NEURAL INFORM PR
[6]  
[Anonymous], 2007, Ontology matching, DOI 10.1007/978-3-540-49612-0
[7]  
[Anonymous], 1994, P AAAI
[8]  
[Anonymous], 1999, TECH REPORT STANFORD
[9]  
[Anonymous], 1998, Eurowordnet: A Multilingual Database with Lexical Semantic Networks
[10]  
[Anonymous], THESIS U LEIPZIG