K-means clustering:: A half-century synthesis

被引:708
作者
Steinley, Douglas [1 ]
机构
[1] Univ Missouri, Dept Psychol Sci, Columbia, MO 65211 USA
关键词
D O I
10.1348/000711005X48266
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years. The K-means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions of K-means using different metrics or modifications of the original algorithm are given, leading to a unifying treatment of K-means and some of its extensions. Finally, several future studies are outlined that could enhance the understanding of numerous subtleties affecting the performance of the K-means method.
引用
收藏
页码:1 / 34
页数:34
相关论文
共 213 条
[81]   PRINCIPAL POINTS [J].
FLURY, BA .
BIOMETRIKA, 1990, 77 (01) :33-41
[82]  
FORGY EW, 1965, BIOMETRICS, V21, P768
[83]   VARIABLE SELECTION IN CLUSTERING [J].
FOWLKES, EB ;
GNANADESIKAN, R ;
KETTENRING, JR .
JOURNAL OF CLASSIFICATION, 1988, 5 (02) :205-228
[84]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[85]   ON SOME INVARIANT CRITERIA FOR GROUPING DATA [J].
FRIEDMAN, HP ;
RUBIN, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1967, 62 (320) :1159-&
[86]   EXPLORATORY PROJECTION PURSUIT [J].
FRIEDMAN, JH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1987, 82 (397) :249-266
[87]  
GAENSSLER P, 1988, DATA ANAL INFORMATIC, V5, P365
[88]   Robustness properties of k means and trimmed k means [J].
García-Escudero, LA ;
Gordaliza, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (447) :956-969
[89]   Asymptotics for trimmed k-means and associated tolerance zones [J].
García-Escudero, LA ;
Gordaliza, A ;
Matrán, C .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 77 (02) :247-262
[90]  
García-Escudero LA, 1999, ANN STAT, V27, P1061