Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy

被引:345
作者
Liu Yaohui [1 ,2 ]
Ma Zhengming [1 ]
Yu Fang [2 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R China
[2] Xiangnan Univ, Sch Software & Commun Engn, Chenzhou 423000, Hunan, Peoples R China
关键词
Clustering algorithm; Density peaks; K-nearest neighbors; Aggregating; FIND;
D O I
10.1016/j.knosys.2017.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Recently a density peaks based clustering algorithm (dubbed as DPC) was proposed to group data by setting up a decision graph and finding out cluster centers from the graph fast. It is simple but efficient since it is noniterative and needs few parameters. However, the improper selection of its parameter cutoff distance d(c) will lead to the wrong selection of initial cluster centers, but the DPC cannot correct it in the subsequent assignment process. Furthermore, in some cases, even the proper value of d(c) was set, initial cluster centers are still difficult to be selected from the decision graph. To overcome these defects, an adaptive clustering algorithm (named as ADPC-KNN) is proposed in this paper. We introduce the idea of K-nearest neighbors to compute the global parameter d(c) and the local density pi of each point, apply a new approach to select initial cluster centers automatically, and finally aggregate clusters if they are density reachable. The ADPC-KNN requires only one parameter and the clustering is automatic. Experiments on synthetic and real-world data show that the proposed clustering algorithm can often outperform DB-SCAN, DPC, K-Means++, Expectation Maximization (EM) and single-link. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:208 / 220
页数:13
相关论文
共 34 条
[1]
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[2]
[Anonymous], TECHNICAL REPORT
[3]
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[4]
Robust path-based spectral clustering [J].
Chang, Hong ;
Yeung, Dit-Yan .
PATTERN RECOGNITION, 2008, 41 (01) :191-203
[5]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]
Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145
[7]
Ester M., 1996, P 2 INT C KNOWL DISC, P226
[8]
Iterative shrinking method for clustering problems [J].
Fränti, P ;
Virmajoki, I .
PATTERN RECOGNITION, 2006, 39 (05) :761-775
[9]
Fast agglomerative clustering using a k-nearest neighbor graph [J].
Franti, Pasi ;
Virmajoki, Olli ;
Hautamaki, Ville .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (11) :1875-1881
[10]
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data [J].
Fu, Limin ;
Medico, Enzo .
BMC BIOINFORMATICS, 2007, 8