Density-ratio based clustering for discovering clusters with varying densities

被引:107
作者
Zhu, Ye [1 ]
Ting, Kai Ming [2 ]
Carman, Mark J. [1 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[2] Federat Univ, Sch Engn & Informat Technol, Mt Helen, Vic 3842, Australia
关键词
Density-ratio; Varying densities; Density-based clustering; Scaling;
D O I
10.1016/j.patcog.2016.07.007
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:983 / 997
页数:15
相关论文
共 24 条
[1]
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]
[Anonymous], 2012, Applied multidimensional scaling
[4]
[Anonymous], 2009, INT J GEOMATH, DOI DOI 10.1007/S13137-020-00149-9
[5]
DDSC : A Density Differentiated Spatial Clustering Technique [J].
Borah, B. ;
Bhattacharyya, D. K. .
JOURNAL OF COMPUTERS, 2008, 3 (02) :72-79
[6]
RANK TRANSFORMATIONS AS A BRIDGE BETWEEN PARAMETRIC AND NONPARAMETRIC STATISTICS [J].
CONOVER, WJ ;
IMAN, RL .
AMERICAN STATISTICIAN, 1981, 35 (03) :124-129
[7]
NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]
Demsar J, 2006, J MACH LEARN RES, V7, P1
[9]
Ertöz L, 2003, SIAM PROC S, P47
[10]
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226