快速查找初始聚类中心的Kmeans算法

被引:19
作者
曹志宇
张忠林
李元韬
机构
[1] 兰州交通大学电子与信息工程学院
关键词
聚类; 数据样本; 欧式距离; kmeans算法; 聚类中心;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
传统的kmeans算法对初始聚类中心十分敏感,聚类结果随不同的初始输入而波动,容易陷入局部最优.为消除这种敏感性,针对kmeans算法,提出了一种新的基于数据样本分布选取初始聚类中心的方法,对公共数据库UCI里面的数据实验表明改进后的kmeans算法能产生质量较高的聚类结果,并且消除了对初始输入的敏感性.
引用
收藏
页码:15 / 18
页数:4
相关论文
共 8 条
[1]  
Learning si mple relations:theoryand applications. Berkhin P,Becher J. Proceedings of the2nd SIAMICDM . 2002
[2]  
Refining clusters in high-dimensional text data. Dhillon I. S,Guan Y,and Kogan J. Proceedings of the Workshop on Clustering High Dimensional Data and its Applications at the Second SIAM International Conference on Data Mining . 2002
[3]  
GeneralizedK-Harmonic Means:Dynamic Weighting of Data in Unsupervised Learning. Zhang B. Proc of the1st SIAMInternational Conference on Data Mining . 2001
[4]  
AGenetic Rule-Based Data Clustering Toolkit. Sarafis I,Zalzala AMS,Trinder P W. Proc of the Congress on Evolutionary Computa-tion . 2002
[5]  
A Scalable approach to balanced,digh-dimensional clustering of market baskets. Strehl A,Ghosh J. Proceedings of the 17th International Conference on High Performance Computing . 2000
[6]  
On Scaling up Balanced C lustering A lgorithm s. Banerjee A,,Ghosh J. Proceed ings of the 2nd SIAM ICDM . 2002
[7]  
Some methods for classification and analysis of multivariate observations. MacQueen J. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability . 1967
[8]  
X-means:extending K-means withefficient esti mation of the number of the clusters. Pelleg D,Moore A. Proceedings of the 17th ICML . 2000