Feature-based approaches to semantic similarity assessment of concepts using Wikipedia

被引:80
作者
Jiang, Yuncheng [1 ]
Zhang, Xiaopei [1 ]
Tang, Yong [1 ]
Nie, Ruihua [1 ]
机构
[1] S China Normal Univ, Sch Comp Sci, Guangzhou 510631, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Concept similarity; Semantic similarity; Semantic relatedness; Feature-based measures; Wikipedia; INFORMATION-CONTENT; RELATEDNESS; DOMAIN; CONTEXT;
D O I
10.1016/j.ipm.2015.01.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic similarity assessment 'between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wilcipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:215 / 234
页数:20
相关论文
共 60 条
[1]   Semantic similarity assessment of words using weighted WordNet [J].
Ahsaee, Mostafa Ghazizadeh ;
Naghibzadeh, Mahmoud ;
Naeini, S. Ehsan Yasrebi .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (03) :479-490
[2]  
[Anonymous], 1998, WordNet
[3]  
Banerjee S., 2003, P 18 INT JOINT C ART, P805
[4]   Semantic similarity estimation from multiple ontologies [J].
Batet, Montserrat ;
Sanchez, David ;
Valls, Aida ;
Gibert, Karina .
APPLIED INTELLIGENCE, 2013, 38 (01) :29-44
[5]   An ontology-based measure to compute semantic similarity in biomedicine [J].
Batet, Montserrat ;
Sanchez, David ;
Valls, Aida .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (01) :118-125
[6]   DBpedia - A crystallization point for the Web of Data [J].
Bizer, Christian ;
Lehmann, Jens ;
Kobilarov, Georgi ;
Auer, Soeren ;
Becker, Christian ;
Cyganiak, Richard ;
Hellmann, Sebastian .
JOURNAL OF WEB SEMANTICS, 2009, 7 (03) :154-165
[7]  
Budanitsky A, 2006, COMPUT LINGUIST, V32, P13, DOI 10.1162/coli.2006.32.1.13
[8]   Merging domain ontologies based on the WordNet system and Fuzzy Formal Concept Analysis techniques [J].
Chen, Rung-Ching ;
Bau, Cho-Tscan ;
Yeh, Chun-Ju .
APPLIED SOFT COMPUTING, 2011, 11 (02) :1908-1923
[9]   Measuring semantic similarity between Gene Ontology terms [J].
Couto, Francisco M. ;
Silva, Mario J. ;
Coutinho, Pedro M. .
DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) :137-152
[10]   Unifying ontological similarity measures: A theoretical and empirical investigation [J].
Cross, Valerie ;
Yu, Xinran ;
Hu, Xueheng .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (07) :861-875