Automatic Cluster Number Selection using a Split and Merge K-Means Approach

被引:11
作者
Muhr, Markus [1 ]
Granitzer, Michael [2 ]
机构
[1] Know Ctr Graz, Graz, Austria
[2] Graz Univ Technol, Inst Knowledge Management, Graz, Austria
来源
PROCEEDINGS OF THE 20TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATION | 2009年
关键词
k-means; validity indices; cluster number selection; split and merge;
D O I
10.1109/DEXA.2009.39
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The k-means method is a simple and fast clustering technique that exhibits the problem of specifying the optimal number of clusters preliminarily. We address the problem of cluster number selection by using a k-means approach that exploits local changes of internal validity indices to split or merge clusters. Our split and merge k-means issues criterion functions to select clusters to be split or merged and fitness assessments on cluster structure changes. Experiments on standard test data sets show that this approach selects an accurate number of clusters with reasonable runtime and accuracy.
引用
收藏
页码:363 / +
页数:2
相关论文
共 14 条
[1]  
[Anonymous], P TEXT MIN WORKSH KD
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]   A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].
BALL, GH ;
HALL, DJ .
BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&
[4]   Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres [J].
Banerjee, A ;
Ghosh, J .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (03) :702-719
[5]  
Caliski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI DOI 10.1080/03610927408827101
[6]  
Dhillon I.S., 2001, DATA MINING SCI ENG
[7]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[8]  
Duda R. O., 2000, Pattern classification
[9]   STATISTICAL-THEORY IN CLUSTERING [J].
HARTIGAN, JA .
JOURNAL OF CLASSIFICATION, 1985, 2 (01) :63-76
[10]   A CRITERION FOR DETERMINING THE NUMBER OF GROUPS IN A DATA SET USING SUM-OF-SQUARES CLUSTERING [J].
KRZANOWSKI, WJ ;
LAI, YT .
BIOMETRICS, 1988, 44 (01) :23-34