Investigating diversity of clustering methods: An empirical comparison

被引:169
作者
Gelbard, Roy [1 ]
Goldman, Orit
Spiegler, Israel
机构
[1] Bar Ilan Univ, Grad Sch Business Adm, Informat Syst Program, IL-52900 Ramat Gan, Israel
[2] Tel Aviv Univ, Recanati Grad Sch Business Adm, Technol & Informat Syst Program, IL-69978 Tel Aviv, Israel
关键词
cluster analysis; similarity; binary-positive data representation;
D O I
10.1016/j.datak.2007.01.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper aims to shed some light on the question why clustering algorithms, despite being quantitative and hence supposedly objective in nature, yield different and varied results. To do that, we took 10 common clustering algorithms and tested them over four known datasets, used in the literature as baselines with agreed upon clusters. One additional method, Binary-Positive. developed by our team, was added to the analysis. The results affirm the unpredictable nature of the clustering process, point to different assumptions taken by different methods. One conclusion of the study is to carefully choose the appropriate clustering method for any given application. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:155 / 166
页数:12
相关论文
共 23 条
[1]  
AEBERHARD S, 1992, 9202 JAMES COOK U N
[2]  
AEBERHARD S, 1992, 9201 JAMES COOK U N
[3]  
[Anonymous], ACM COMPUTING SURVEY
[4]  
CHEESEMAN, 1988, MLC P, P54
[6]  
DUDA RO, 1973, PATTERN CLASSIFICATI, P218
[7]  
Erlich Z, 2003, J COMPUT INFORM SYST, V43, P100
[8]  
ERLICH Z, 2002, BINARY POSITIVE MODE
[9]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[10]  
FORINA M, PARVUS EXTENDIBLE PA