Automatic subspace clustering of high dimensional data

被引:232
作者
Agrawal, R
Gehrke, J
Gunopulos, D
Raghavan, P
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
[2] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[3] Ver Inc, Sunnyvale, CA 94089 USA
关键词
subspace clustering; clustering; dimensionality reduction;
D O I
10.1007/s10618-005-1396-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
引用
收藏
页码:5 / 33
页数:29
相关论文
共 54 条
  • [1] AGGARWAL CC, 2000, P ACM SIGMOD INT C M, P70, DOI DOI 10.1145/335191
  • [2] AGGRAWAL C, 1999, P 1999 ACM SIGMOD IN
  • [3] Aho A.V., 1974, The Design and Analysis of Computer Algorithms
  • [4] [Anonymous], 1996, Advances in Knowledge Discovery and Data Mining, DOI DOI 10.1007/978-3-319-31750-2.
  • [5] [Anonymous], 1996, P ACM SIGMOD C MAN D
  • [6] [Anonymous], 1997, P ACM SIGMOD INT C M
  • [7] [Anonymous], P 6 INT C EXT DAT TE
  • [8] [Anonymous], 1996, P AAAI INT C KNOWL D
  • [9] *ARB SOFTW CORP, APPL MAN US GUID ESS
  • [10] BAYARDO R, 1998, P ACM SIGMOD C MAN D