On clustering validation techniques

被引:1653
作者
Halkidi, M [1 ]
Batistakis, Y [1 ]
Vazirgiannis, M [1 ]
机构
[1] Athens Univ Econ & Business, Dept Informat, Athens 10434, Greece
关键词
clustering algorithms; unsupervised learning; cluster validity; validity indices;
D O I
10.1023/A:1012801612483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.
引用
收藏
页码:107 / 145
页数:39
相关论文
共 32 条
[21]  
MITCHELL T, 1989, ANNU REV COMPUT SCI, V4, P417
[22]  
Ng R.T., 1994, P 20 VLDB C SANT CHI
[23]   Cluster validation using graph theoretic concepts [J].
Pal, NR ;
Biswas, J .
PATTERN RECOGNITION, 1997, 30 (06) :847-857
[24]   A new cluster validity index for the fuzzy c-mean [J].
Rezaee, MR ;
Lelieveldt, BPF ;
Reiber, JHC .
PATTERN RECOGNITION LETTERS, 1998, 19 (3-4) :237-246
[25]  
Sharma S., 1996, APPL MULTIVARIATE TE
[26]  
SHEIKHOLESLAMI C, 1998, P 24 VLDB C NEW YORK
[27]  
SMYTH P, 1996, P KDD C
[28]  
Theodoridis S., 1999, PATTERN RECOGNITION
[29]  
THEODORIDIS Y, 1999, SPATIAL DATASETS UNO
[30]  
Wang W., 1997, P 23 VLDB C, P1