A comparative study of efficient initialization methods for the k-means clustering algorithm

被引:837
作者
Celebi, M. Emre [1 ]
Kingravi, Hassan A. [2 ]
Vela, Patricio A. [2 ]
机构
[1] Louisiana State Univ, Dept Comp Sci, Shreveport, LA 71105 USA
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Partitional clustering; Sum of squared error criterion; k-means; Cluster center initialization; ENCODING ALGORITHM; QUANTIZATION; PERFORMANCE; SETS;
D O I
10.1016/j.eswa.2012.07.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using nonparametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:200 / 210
页数:11
相关论文
共 76 条
  • [1] Robust partitional clustering by outlier and density insensitive seeding
    Al Hasan, Mohammad
    Chaoji, Vineet
    Salem, Saeed
    Zaki, Mohammed J.
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (11) : 994 - 1002
  • [2] Al-Daoud MB, 2005, PROC WRLD ACAD SCI E, V4, P74
  • [3] New methods for the initialisation of clusters
    AlDaoud, MB
    Roberts, SA
    [J]. PATTERN RECOGNITION LETTERS, 1996, 17 (05) : 451 - 455
  • [4] Aloise D., 2010, MATH PROGRAM, P1
  • [5] NP-hardness of Euclidean sum-of-squares clustering
    Aloise, Daniel
    Deshpande, Amit
    Hansen, Pierre
    Popat, Preyas
    [J]. MACHINE LEARNING, 2009, 75 (02) : 245 - 248
  • [6] Anderberg M.R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
  • [7] [Anonymous], 1970, SPEECH ANAL CLUSTERI
  • [8] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [9] A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM
    BABU, GP
    MURTY, MN
    [J]. PATTERN RECOGNITION LETTERS, 1993, 14 (10) : 763 - 769
  • [10] BABU PG, 1994, INDIAN J PURE AP MAT, V25, P85