Profiling local optima in K-means clustering: Developing a diagnostic technique

被引:71
作者
Steinley, Douglas [1 ]
机构
[1] Univ Missouri, Dept Psychol Sci, Columbia, MO 65203 USA
关键词
k-means clustering; local optima; cluster validation;
D O I
10.1037/1082-989X.11.2.178
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions.
引用
收藏
页码:178 / 192
页数:15
相关论文
共 52 条
[1]  
Agresti A., 2018, INTRO CATEGORICAL DA
[2]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
[3]  
[Anonymous], 2005, Exploratory data analysis with MATLAB
[4]   COMPARATIVE-EVALUATION OF 2 SUPERIOR STOPPING RULES FOR HIERARCHICAL CLUSTER-ANALYSIS [J].
ATLAS, RS ;
OVERALL, JE .
PSYCHOMETRIKA, 1994, 59 (04) :581-591
[5]  
Bartholomew D. J., 1999, LATENT VARIABLE MODE
[6]   MONTE-CARLO COMPARISONS OF SELECTED CLUSTERING PROCEDURES [J].
BAYNE, CK ;
BEAUCHAMP, JJ ;
BEGOVICH, CL ;
KANE, VE .
PATTERN RECOGNITION, 1980, 12 (02) :51-62
[7]   A comparison of maximum covariance and k-means cluster analysis in classifying cases into known taxon groups [J].
Beauchaine, TP ;
Beauchaine, RJ .
PSYCHOLOGICAL METHODS, 2002, 7 (02) :245-261
[8]  
Blashfield RK., 1988, Handbook of Multivariate Experimental Psychology, V2nd, P447, DOI DOI 10.1007/978-1-4613-0893-5_14
[9]   Bicriterion seriation methods for skew-symmetric matrices [J].
Brusco, MJ ;
Stahl, S .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2005, 58 :333-343
[10]   Clustering binary data in the presence of masking variables [J].
Brusco, MJ .
PSYCHOLOGICAL METHODS, 2004, 9 (04) :510-523