Profiling local optima in K-means clustering: Developing a diagnostic technique

被引:71
作者
Steinley, Douglas [1 ]
机构
[1] Univ Missouri, Dept Psychol Sci, Columbia, MO 65203 USA
关键词
k-means clustering; local optima; cluster validation;
D O I
10.1037/1082-989X.11.2.178
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions.
引用
收藏
页码:178 / 192
页数:15
相关论文
共 52 条
[41]   A COMPARISON OF CLUSTER-ANALYSIS TECHNIQUES WITHIN A SEQUENTIAL VALIDATION FRAMEWORK [J].
MOREY, LC ;
BLASHFIELD, RK ;
SKINNER, HA .
MULTIVARIATE BEHAVIORAL RESEARCH, 1983, 18 (03) :309-329
[42]   IDENTIFYING CLUSTER OVERLAP WITH NORMIX POPULATION MEMBERSHIP PROBABILITIES [J].
PRICE, LJ .
MULTIVARIATE BEHAVIORAL RESEARCH, 1993, 28 (02) :235-262
[43]   OCLUS: An analytic method for generating clusters with known overlap [J].
Steinley, D ;
Henson, R .
JOURNAL OF CLASSIFICATION, 2005, 22 (02) :221-250
[44]  
Steinley D, 2004, ST CLASS DAT ANAL, P53
[45]   Properties of the Hubert-Arabie adjusted rand index [J].
Steinley, D .
PSYCHOLOGICAL METHODS, 2004, 9 (03) :386-396
[46]   Local optima in K-means clustering:: What you don't know may hurt you [J].
Steinley, D .
PSYCHOLOGICAL METHODS, 2003, 8 (03) :294-304
[47]  
STEINLEY D, IN PRESS BRIT J MATH
[48]  
STEINLEY D, 2005, INITIALIZING K MEANS
[49]   Estimating the number of clusters in a data set via the gap statistic [J].
Tibshirani, R ;
Walther, G ;
Hastie, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 :411-423
[50]   Determining the number of clusters by sampling with replacement [J].
Tonidandel, S ;
Overall, JE .
PSYCHOLOGICAL METHODS, 2004, 9 (02) :238-249