An approach for measuring semantic similarity between words using multiple information sources

被引：36

作者：

Li, YH

Bandar, ZA

McLean, D

机构：

[1] Univ Manchester, Manchester Sch Engn, Manchester M13 9PL, Lancs, England

[2] Manchester Metropolitan Univ, Intelligent Syst Grp, Dept Comp & Math, Manchester M1 5GD, Lancs, England

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2003年 / 15卷 / 04期

关键词：

semantic similarity; lexical database; information content; corpus statistics;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];

摘要：

Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.

引用

页码：871 / 882

页数：12

共 25 条

[1]

Abney Steven, 1999, P ACL WORKSH UNS LEA, P1

[2]

Agirre E., 1996, P 16 INT C COMP LING

[3]

Asymmetries of comparison [J].