An objective approach to cluster validation

被引:89
作者
Bouguessa, Mohamed
Wang, Shengrui [1 ]
Sun, Haojun
机构
[1] Univ Sherbrooke, Fac Sci, Dept Comp Sci, Sherbrooke, PQ J1K 2R1, Canada
[2] Hebei Univ, Coll Math & Comp Sci, Baoding 071002, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
fuzzy clustering; validity index; overlapping clusters; overlap rate; truthed data set;
D O I
10.1016/j.patrec.2006.01.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster validation is a major issue in cluster analysis. Many existing validity indices do not perform well when clusters overlap or there is significant variation in their covariance structure. The contribution of this paper is twofold. First, we propose a new validity index for fuzzy clustering. Second, we present a new approach for the objective evaluation of validity indices and clustering algorithms. Our validity index makes use of the covariance structure of clusters, while the evaluation approach utilizes a new concept of overlap rate that gives a formal measure of the difficulty of distinguishing between overlapping clusters. We have carried out experimental studies using data sets containing clusters of different shapes and densities and various overlap rates, in order to show how validity indices behave when clusters become less and less separable. Finally, the effectiveness of the new validity index is also demonstrated on a number of real-life data sets. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1419 / 1430
页数:12
相关论文
共 24 条
[1]  
Aitnouri E., 2000, Pattern Recognition and Image Analysis, V10, P206
[2]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[3]  
Bezdek J., 1999, FUZZY MODELS ALGORIT
[4]   OPTIMAL FUZZY PARTITIONS - HEURISTIC FOR ESTIMATING PARAMETERS IN A MIXTURE OF NORMAL DISTRIBUTIONS [J].
BEZDEK, JC ;
DUNN, JC .
IEEE TRANSACTIONS ON COMPUTERS, 1975, 24 (08) :835-838
[5]  
Duda R. O., 1973, PATTERN CLASSIFICATI
[6]  
Dunn J.C., 1973, J CYBERNETICS, V3, P32, DOI DOI 10.1080/01969727308546046
[7]  
Fukunaga K., 1990, INTRO STAT PATTERN R
[8]   UNSUPERVISED OPTIMAL FUZZY CLUSTERING [J].
GATH, I ;
GEVA, AB .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (07) :773-781
[9]   A comparison of cluster validity criteria for a mixture of normal distributed data [J].
Geva, AB ;
Steinberg, Y ;
Bruckmair, S ;
Nahum, G .
PATTERN RECOGNITION LETTERS, 2000, 21 (6-7) :511-529
[10]  
Hart, 2006, PATTERN CLASSIFICATI