快速查找初始聚类中心的Kmeans算法

被引：19

作者：

曹志宇

张忠林

李元韬

机构：

[1] 兰州交通大学电子与信息工程学院

来源：

兰州交通大学学报 | 2009年 / 28卷 / 06期

关键词：

聚类; 数据样本; 欧式距离; kmeans算法; 聚类中心;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

传统的kmeans算法对初始聚类中心十分敏感,聚类结果随不同的初始输入而波动,容易陷入局部最优.为消除这种敏感性,针对kmeans算法,提出了一种新的基于数据样本分布选取初始聚类中心的方法,对公共数据库UCI里面的数据实验表明改进后的kmeans算法能产生质量较高的聚类结果,并且消除了对初始输入的敏感性.

引用

页码：15 / 18

页数：4

共 8 条

[1]

Learning si mple relations:theoryand applications. Berkhin P,Becher J. Proceedings of the2nd SIAMICDM . 2002

[2]

Refining clusters in high-dimensional text data. Dhillon I. S,Guan Y,and Kogan J. Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the Second SIAM International Conference on Data Mining . 2002

[3]

GeneralizedK-Harmonic Means:Dynamic Weighting of Data in Unsupervised Learning. Zhang B. Proc of the1st SIAMInternational Conference on Data Mining . 2001

[4]

AGenetic Rule-Based Data Clustering Toolkit. Sarafis I,Zalzala AMS,Trinder P W. Proc of the Congress on Evolutionary Computa-tion . 2002

[5]

A Scalable approach to balanced,digh-dimensional clustering of market baskets. Strehl A,Ghosh J. Proceedings of the 17th International Conference on High Performance Computing . 2000

[6]

On Scaling up Balanced C lustering A lgorithm s. Banerjee A,,Ghosh J. Proceed ings of the 2nd SIAM ICDM . 2002

[7]

Some methods for classification and analysis of multivariate observations. MacQueen J. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability . 1967

[8]

X-means:extending K-means withefficient esti mation of the number of the clusters. Pelleg D,Moore A. Proceedings of the 17th ICML . 2000

← 1 →