基于向量空间模型的文本聚类算法

被引：48

作者：

姚清耘

刘功申

李翔

机构：

[1] 上海交通大学信息安全工程学院

来源：

关键词：

向量空间模型; 文本聚类; 语料库;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

摘要：

文本聚类是聚类的一个重要研究分支,是聚类方法在文本处理领域的应用。该文探讨了基于向量空间模型的文本聚类方法,提出了一种文本聚类的改进算法——LP算法。同时,基于语料库的实际聚类效果,就维度确定、特征选择等方面提出优化方案。实验证明,LP算法有效地减少了聚类所消耗的时间,实用性和灵活性都较高。

引用

页码：39 / 41+44 +44

页数：4

共 6 条

[1] Using Clustering to Boost Text Classification. Fang Y C,,Parthasarathy S,Schwartz F. Proc.of the IEEE ICDM Workshop on Text Mining . 2002
[2] Simfinder:A Flexible Clustering Tool for Summarization. Hatzivassiloglou V. Proc of NAACL Workshop on Automatic Summarization,Association for Computational Linguistics . 2001
[3] Bayesian Classification(AutoClass):Theory and Results. Cheeseman P,,Stutz J. Proc.of Advances in Knowledge Discovery and Data Mining . 1996
[4] A vector space modelfor automatic indexing. Salton G,Wong A,Yang C S. Communications of the ACM . 1995
[5] Learningprobabilistic userprofiles. ACKERMANM,BILLSUSD,GAFFNEYS. AIMagazine . 1997
[6] Scatter/Gather:A Cluster Based Approach to Browsing Large Document Collection. Cutting D,,Karger D. Proc.of SIGIR’92 . 1992