Performance evaluation of some clustering algorithms and validity indices

被引:949
作者
Maulik, U [1 ]
Bandyopadhyay, S
机构
[1] Univ Texas, Dept Comp Sci & Engn, Arlington, TX 76019 USA
[2] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, W Bengal, India
关键词
unsupervised classification; Euclidean distance; K-Means algorithm; single linkage algorithm; validity index; simulated annealing;
D O I
10.1109/TPAMI.2002.1114856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.
引用
收藏
页码:1650 / 1654
页数:5
相关论文
共 17 条