A comparison of cluster validity criteria for a mixture of normal distributed data

被引:24
作者
Geva, AB [1 ]
Steinberg, Y [1 ]
Bruckmair, S [1 ]
Nahum, G [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Elect & Comp Engn, IL-84105 Beer Sheva, Israel
基金
以色列科学基金会;
关键词
cluster validity; mixture of normal distributed data; unsupervised clustering; generalized Neyman-Pearson (GNP) criterion; hypothesis testing; entropy maximization;
D O I
10.1016/S0167-8655(00)00016-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many validity criteria have been proposed over the years in order to validate clustering of unlabeled data sets. In this research we compared the performance of several known validity criteria to several new validity criteria for a mixture of normally distributed data. The main group of the new criteria includes modifications of the Gath and Geva partition and average density criteria while one new criterion is based on the generalized Neyman-Pearson (GNP) test for normality. The comparison was performed by using simulated Gaussian data sets, which were built from 1 to 5 clusters in 1-4 dimensions with a variety of clusters means and variances. The clustering process was implemented by the unsupervised optimal fuzzy clustering (UOFC) algorithm that combines the fuzzy c-means (FCM) algorithm and a fuzzy modification of the maximum likelihood estimation algorithm (FMLE). We conclude that in general, there is no single validity criterion that consistently performed much better than the others under all conditions, but nevertheless we can state clearly that some of the new validity criteria showed advantages in validating most of the simulated Gaussian data sets. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:511 / 529
页数:19
相关论文
共 14 条
[1]   PROTOTYPE CLASSIFICATION AND FEATURE SELECTION WITH FUZZY SETS [J].
BEZDEK, JC ;
CASTELAZ, PF .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1977, 7 (02) :87-92
[2]  
BEZDEK JC, 1998, IEEE T SYST MAN CY B, V28
[3]  
Fukuyama Y., 1989, P 5 FUZZ SYST S, V5, P247
[4]   FUZZY CLUSTERING FOR THE ESTIMATION OF THE PARAMETERS OF THE COMPONENTS OF MIXTURES OF NORMAL-DISTRIBUTIONS [J].
GATH, I ;
GEVA, AB .
PATTERN RECOGNITION LETTERS, 1989, 9 (02) :77-86
[5]   UNSUPERVISED CLUSTERING OF EVOKED-POTENTIALS BY WAVE-FORM [J].
GEVA, AB ;
PRATT, H .
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 1994, 32 (05) :543-550
[6]  
Hart P.E., 1973, Pattern recognition and scene analysis
[8]   ASYMPTOTICALLY OPTIMAL TESTS FOR MULTINOMIAL DISTRIBUTIONS [J].
HOEFFDING, W .
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (02) :369-408
[9]  
PAL NR, 1995, IEEE T FUZZY SYS, V3
[10]  
PETERS BC, 1978, SIAM J APPL MATH, V35, P362