Stability of k-means clustering

被引:45
作者
Ben-David, Shai [1 ]
Pal, Ddvid [1 ]
Simon, Hans Ulrich [2 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[2] Ruhr Univ Bochum, Bouchum, Germany
来源
LEARNING THEORY, PROCEEDINGS | 2007年 / 4539卷
关键词
D O I
10.1007/978-3-540-72927-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal 14 solutions to the clustering optimization problem. Our results complement earlier work of Ben-David, von Luxburg and Pal, by settling the main problem left open there. Our analysis shows that, for probability distributions with finite support, the stability of k-means clusterings depends solely on the number of optimal solutions to the underlying optimization problem for the data distribution. These results challenge the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.
引用
收藏
页码:20 / +
页数:2
相关论文
共 10 条
[1]  
[Anonymous], 2005, PASCAL WORKSH STAT O
[2]   A framework for statistical clustering with a constant time approximation algorithms for K-median clustering [J].
Ben-David, S .
LEARNING THEORY, PROCEEDINGS, 2004, 3120 :415-426
[3]   A sober look at clustering stability [J].
Ben-David, Shai ;
von Luxburg, Ulrike ;
Pal, David .
LEARNING THEORY, PROCEEDINGS, 2006, 4005 :5-19
[4]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[5]  
Dudoit S, 2002, GENOME BIOL, V3
[6]   Resampling method for unsupervised estimation of cluster validity [J].
Levine, E ;
Domany, E .
NEURAL COMPUTATION, 2001, 13 (11) :2573-2593
[7]   Comparing clusterings by the variation of information [J].
Meila, M .
LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 :173-187
[8]   STRONG CONSISTENCY OF K-MEANS CLUSTERING [J].
POLLARD, D .
ANNALS OF STATISTICS, 1981, 9 (01) :135-140
[9]  
RAKHLIN A, 2007, ADV NEURAL INFORM PR
[10]  
[No title captured]