A genetic algorithm using hyper-quadtrees for low-dimensional K-means clustering

被引:78
作者
Laszlo, M [1 ]
Mukherjee, S [1 ]
机构
[1] Nova SE Univ, Grad Sch Comp & Informat Sci, Ft Lauderdale, FL 33314 USA
关键词
k-means algorithm; clustering; genetic algorithms; quadtrees; optimal partition; center selection;
D O I
10.1109/TPAMI.2006.66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
引用
收藏
页码:533 / 543
页数:11
相关论文
共 24 条
[1]  
[Anonymous], 1978, INTERACTIVE PATTERN
[2]  
[Anonymous], 1991, P 4 ICGA
[3]   A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM [J].
BABU, GP ;
MURTY, MN .
PATTERN RECOGNITION LETTERS, 1993, 14 (10) :763-769
[4]   An evolutionary technique based on K-Means algorithm for optimal clustering in RN [J].
Bandyopadhyay, S ;
Maulik, U .
INFORMATION SCIENCES, 2002, 146 (1-4) :221-237
[5]   Nonparametric genetic clustering: Comparison of validity indices [J].
Bandyopadhyay, S ;
Maulik, U .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01) :120-125
[6]  
BHUYAN JN, 1991, P 4 INT C GEN ALG, P408
[7]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
[8]  
Goldberg D.E., 1989, OPTIMIZATION MACHINE
[9]  
HAMERL G, 2003, P 17 ANN C NEUR INF
[10]  
Hamerly Greg, 2002, P 11 INT C INF KNOWL