Cluster center initialization algorithm for K-means clustering

被引:506
作者
Khan, SS
Ahmad, A
机构
[1] DRDO, Sci Anal Grp, Delhi 110054, India
[2] DRDO, Solid State Phys Lab, Delhi 110054, India
关键词
K-means clustering; initial cluster centers; cost function; density based multiscale data condensation;
D O I
10.1016/j.patrec.2004.04.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performance of iterative clustering algorithms which converges to numerous local minima depend highly on initial cluster centers. Generally initial cluster centers are selected randomly. In this paper we propose an algorithm to compute initial cluster centers for K-means clustering. This algorithm is based on two observations that some of the patterns are very similar to each other and that is why they have same cluster membership irrespective to the choice of initial cluster centers. Also, an individual attribute may provide some information about initial cluster center. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers, for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. We demonstrate the application of proposed algorithm to K-means clustering algorithm. The experimental results show improved and consistent solutions using the proposed algorithm. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1293 / 1302
页数:10
相关论文
共 21 条
  • [1] Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
  • [2] [Anonymous], 1992, APPL STAT
  • [3] [Anonymous], 2000, DATA MINING PRACTICA
  • [4] Bradley PS, 1997, ADV NEUR IN, V9, P368
  • [5] BRADLEY PS, 1998, P 15 INT C MACH LEAR, P91
  • [6] CATLETT J, 1991, THESIS U SYDNEY AUST
  • [7] Duda R. O., 2000, PATTERN CLASSIFICATI
  • [8] Fayyad U, 1996, AI MAG, V17, P37
  • [9] The use of multiple measurements in taxonomic problems
    Fisher, RA
    [J]. ANNALS OF EUGENICS, 1936, 7 : 179 - 188
  • [10] Fukunaga K., 1990, INTRO STAT PATTERN R