ROCK: A robust clustering algorithm for categorical attributes

被引:323
作者
Guha, S [1 ]
Rastogi, R [1 ]
Shim, K [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 1999年
关键词
D O I
10.1109/ICDE.1999.754967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, We propose a novel concept of kinks to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study With real-life as well as synthetic data sets. Our study shows that ROCK not only generates better quality clusters than traditional algorithms, but also exhibits good scalability properties.
引用
收藏
页码:512 / 521
页数:10
相关论文
共 10 条
  • [1] [Anonymous], 1996, P 2 INT C KNOWL DISC
  • [2] [Anonymous], 1996, P ACM SIGMOD C MAN D
  • [3] Coppersmith Don, 1987, P 19 ANN ACM S THEOR, P1
  • [4] Cormen T. H., 1990, INTRO ALGORITHMS
  • [5] GUHA S, 1998, P ACM SIGMOD C MAN D
  • [6] GUHA S, 1997, CLUSTERING ALGORITHM
  • [7] HAN EH, 1997, 1997 SIGMOD WORKSH R
  • [8] Hart P.E., 1973, Pattern recognition and scene analysis
  • [9] Jain K, 1988, Algorithms for clustering data
  • [10] NG RT, 1994, P VLDB C SANT CHIL S