Fast and general density peaks clustering

被引:62
作者
Sieranoja, Sami [1 ]
Franti, Pasi [1 ]
机构
[1] Univ Eastern Finland, Sch Comp, Box 111, FIN-80101 Joensuu, Finland
关键词
Data clustering; Density peaks; k-nearest neighbors (kNN); Large-scale data; NEAREST NEIGHBORS; FAST SEARCH; ALGORITHM; FRAMEWORK; FIND;
D O I
10.1016/j.patrec.2019.10.019
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Density peaks is a popular clustering algorithm, used for many different applications, especially for nonspherical data. Although powerful, its use is limited by quadratic time complexity, which makes it slow for large datasets. In this work, we propose a fast density peaks algorithm that solves the time complexity problem. The proposed algorithm uses a fast and generic construction of approximate k-nearest neighbor graph both for density and for delta calculation. This approach maintains the generality of density peaks, which allows using it for all types of data, as long as a distance function is provided. For a dataset of size 100,000, our approach achieves a 91:1 speedup factor. The algorithm scales up for datasets up to 1 million in size, which could not be solved by the original algorithm at all. With the proposed method, time complexity is no longer a limiting factor of the density peaks clustering. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:551 / 558
页数:8
相关论文
共 44 条
[1]
Clustering aggregation [J].
Yahoo Research Labs., Barcelona ;
不详 ;
不详 ;
不详 .
ACM Trans. Knowl. Discov. Data, 2007, 1
[2]
An overlapping community detection algorithm based on density peaks [J].
Bai, Xueying ;
Yang, Peilin ;
Shi, Xiaohu .
NEUROCOMPUTING, 2017, 226 :7-15
[3]
Support vector clustering [J].
Ben-Hur, A ;
Horn, D ;
Siegelmann, HT ;
Vapnik, V .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :125-137
[4]
Brew C., 2017, P 2 INT C NEW METH L, P45
[5]
Robust path-based spectral clustering [J].
Chang, Hong ;
Yeung, Dit-Yan .
PATTERN RECOGNITION, 2008, 41 (01) :191-203
[6]
Identification and characterization of the cysteine protease inhibitor gene MdCPI from Musca domestica [J].
Dong, X. ;
Liu, F. ;
Zhang, D. ;
Tang, T. ;
Ge, X. .
INSECT MOLECULAR BIOLOGY, 2011, 20 (05) :577-586
[7]
A novel density peaks clustering algorithm for mixed data [J].
Du, Mingjing ;
Ding, Shifei ;
Xue, Yu .
PATTERN RECOGNITION LETTERS, 2017, 97 :46-53
[8]
Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[9]
Ester M., 1996, P 2 INT C KNOWL DISC, V96, P226, DOI DOI 10.5555/3001460.3001507
[10]
Community detection in graphs [J].
Fortunato, Santo .
PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2010, 486 (3-5) :75-174