A point symmetry-based clustering technique for automatic evolution of clusters

被引:113
作者
Bandyopadhyay, Sanghamitra [1 ]
Saha, Sriparna [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, India
关键词
unsupervised classification; cluster validity index; symmetry; point symmetry-based distance; Kd-tree; genetic algorithms; real encoding;
D O I
10.1109/TKDE.2008.79
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the euclidean distance. A newly proposed PS-based cluster validity index, Sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS-distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length Genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.
引用
收藏
页码:1441 / 1457
页数:17
相关论文
共 40 条
[1]  
ANDERBERG MR, 2000, COMPUTATIONAL GEOMET
[2]  
[Anonymous], P INT C COMP THEOR A
[3]  
[Anonymous], DETECTING STABLE CLU
[4]  
[Anonymous], 1978, INTRO STAT ANAL DATA
[5]  
Asuncion D.N.A., 2007, UCI MACHINE LEARNING
[6]  
ATTNEAVE F, 1955, Am J Psychol, V68, P209, DOI 10.2307/1418892
[7]   Simulated annealing using a Reversible Jump Markov Chain Monte Carlo algorithm for fuzzy clustering [J].
Bandyopadhyay, S .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (04) :479-490
[8]   Genetic clustering for automatic evolution of clusters and application to image classification [J].
Bandyopadhyay, S ;
Maulik, U .
PATTERN RECOGNITION, 2002, 35 (06) :1197-1208
[9]   Nonparametric genetic clustering: Comparison of validity indices [J].
Bandyopadhyay, S ;
Maulik, U .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01) :120-125
[10]   GAPS: A clustering method using a new point symmetry-based distance measure [J].
Bandyopadhyay, Sanghamitra ;
Saha, Sriparna .
PATTERN RECOGNITION, 2007, 40 (12) :3430-3451