Fast density peak clustering for large scale data based on kNN

被引:171
作者
Chen, Yewang [1 ,2 ,3 ,4 ,5 ]
Hu, Xiaoliang [1 ]
Fan, Wentao [1 ,5 ]
Shen, Lianlian [1 ]
Zhang, Zheng [1 ]
Liu, Xin [1 ,2 ,5 ]
Du, Jixiang [1 ,5 ]
Li, Haibo [1 ,7 ]
Chen, Yi [3 ]
Li, Hailin [6 ]
机构
[1] Huaqiao Univ, Coll Comp Sci & Technol, Xiamen, Fujian, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian, Shaanxi, Peoples R China
[3] Beijing Technol & Business Univ, Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
[4] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China
[5] Huaqiao Univ, Fujian Key Lab Big Data Intelligence & Secur, Xiamen, Fujian, Peoples R China
[6] Huaqiao Univ, Coll Business Adm, Quanzhou, Fujian, Peoples R China
[7] Huaqiao Univ, Xiamen Engn Res Ctr Enterprise Interoperabil & Bu, Xiamen, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Density peak; FastDPeak; kNN-density; NEIGHBOR SEARCH;
D O I
10.1016/j.knosys.2019.06.032
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, rho and delta, are both obtained by brute force algorithm with complexity O(n(2)). Thus, a simple but fast DPeak, namely FastDPeak,(1) is proposed, which runs in about O(nlog(n)) expected time in the intrinsic dimensionality. It replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations. Based on kNN-density, local density peaks and non-local density peaks are identified, and a fast algorithm, which uses two different strategies to compute delta for them, is also proposed with complexity O(n). Experimental results show that FastDPeak is effective and outperforms other variants of DPeak. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:7
相关论文
共 38 条
[1]
[Anonymous], I SYMP CONSUM ELECTR
[2]
Fast density clustering strategies based on the k-means algorithm [J].
Bai, Liang ;
Cheng, Xueqi ;
Liang, Jiye ;
Shen, Huawei ;
Guo, Yike .
PATTERN RECOGNITION, 2017, 71 :375-386
[3]
MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[4]
Beygelzimer A., 2006, ICML
[5]
Robust path-based spectral clustering [J].
Chang, Hong ;
Yeung, Dit-Yan .
PATTERN RECOGNITION, 2008, 41 (01) :191-203
[6]
Theoretical analysis and crashworthiness optimization of hybrid multi-cell structures [J].
Chen, Tengteng ;
Zhang, Yong ;
Lin, Jiming ;
Lu, Yong .
THIN-WALLED STRUCTURES, 2019, 142 :116-131
[7]
Semi-Convex Hull Tree: Fast Nearest Neighbor Queries for Large Scale Data on GPUs [J].
Chen, Yewang ;
Zhou, Lida ;
Bouguila, Nizar ;
Zhong, Bineng ;
Wu, Fei ;
Lei, Zhen ;
Du, Jixiang ;
Li, Hailin .
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, :911-916
[8]
Fast neighbor search by using revised k-d tree [J].
Chen, Yewang ;
Zhou, Lida ;
Tang, Yi ;
Singh, Jai Puneet ;
Bouguila, Nizar ;
Wang, Cheng ;
Wang, Huazhen ;
Du, Jixiang .
INFORMATION SCIENCES, 2019, 472 :145-162
[9]
A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data [J].
Chen, Yewang ;
Tang, Shengyu ;
Bouguila, Nizar ;
Wang, Cheng ;
Du, Jixiang ;
Li, HaiLin .
PATTERN RECOGNITION, 2018, 83 :375-387
[10]
DHeat: A Density Heat-Based Algorithm for Clustering With Effective Radius [J].
Chen, Yewang ;
Tang, Shengyu ;
Pei, Songwen ;
Wang, Cheng ;
Du, Jixiang ;
Xiong, Naixue .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (04) :649-660