基于改进k-means算法的中文词义归纳

被引：7

作者：

张宜浩 ^{[1
,2
]}

金澎 ^{[1
,2
]}

孙锐 ^{[1
,2
]}

机构：

[1] 乐山师范学院计算机科学学院

[2] 乐山师范学院智能信息处理与应用实验室

来源：

计算机应用 | 2012年 / 32卷 / 05期

关键词：

词义归纳; k-means算法; 聚类; 同义词词林;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

摘要：

汉语中一词多义现象普遍存在,词义归纳就是对在不同语境中具有相同语义的词进行归类,本质上是一聚类问题。目前广泛采用无指导的聚类方法对词义归纳进行研究,提出一种改进的k-means算法,该算法主要从初始簇中心的选取以及簇均值的计算两个方面进行改进,在一定程度上克服了其对"噪声"和孤立点数据的敏感。在特征表示上用同义词词林中词的分类编号来降低特征维度。实验表明改进k-means算法在性能上有较大的提升,F-Score达到了75.8%。

引用

页码：1332 / 1334

页数：3

共 13 条

[1] 基于特征选择和最大熵模型的汉语词义消歧
何径舟
王厚峰
[J]. 软件学报, 2010, 21 (06) : 1287 - 1295
[2] Evaluating word sense induction and discrimi-nation systems. ENEKO A,AITOR S. SemEval-2007:Proceedings of the 4th Interna-tional Workshop on Semantic Evaluations . 2007
[3] 基于同义词词林的词语相似度计算方法
田久乐
赵蔚
[J]. 吉林大学学报(信息科学版), 2010, 28 (06) : 602 - 608
[4] Lexical knowledge representation withcontexonyms. JI H,PLOUX S,WEHRLI E. http://www.cs.toronto.edu/-gh/Courses/2528/Readings/Ji-etal-Contexonyms.pdf . 2011
[5] Hierarchical clustering algorithms for document datasets
Zhao, Y
Karypis, G
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (02) : 141 - 168
[6] 基于维基百科类别的文本特征表示
王锦
王会珍
张俐
[J]. 中文信息学报, 2011, (02) : 27 - 31
[7] Word-sense dis-ambiguation for machine translation. VICKREY D,BIEWALD L,TEYSSLER M,et al. Proceedings of the Confer-ence on Human Language Technology and Empirical Methods in Nat-ural Language Processing . 2005
[8] Automatic word sense discrimination. Hinrich Schutze. Computational Linguistics . 1998
[9] 朱虹,刘扬.词汇语义知识库的研究现状与发展趋势[J].情报学报,2008 (06)
[10] Discovering informative content blocks from web documents. Lin Shian-Hua,Ho Jan-Ming. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2002

← 1 2 →