A novel cluster center fast determination clustering algorithm

被引：69

作者：

Chen Jinyin ^{[1
]}

Lin Xiang ^{[1
]}

Zheng Haibing ^{[1
]}

Bao Xintong ^{[1
]}

机构：

[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou, Zhejiang, Peoples R China

来源：

APPLIED SOFT COMPUTING | 2017年 / 57卷

基金：

中国国家自然科学基金;

关键词：

Data mining; Clustering algorithm; Rapid determination of cluster centers; Density based clustering;

D O I：

10.1016/j.asoc.2017.04.031

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

As one of the most important techniques in data mining, cluster analysis has attracted more and more attentions in this big data era. Most clustering algorithms have encountered with challenges including cluster centers determination difficulty, low clustering accuracy, uneven clustering efficiency of different data sets and sensible parameter dependence. Aiming at clustering center determination difficulty and parameter dependence, a novel cluster center fast determination clustering algorithm was proposed in this paper. It is supposed that clustering centers are those data points with higher density and larger distance from other data points of higher density. Normal distribution curves are designed to fit the density distribution curve of density distance product. And the singular points outside the confidence interval by setting the confidence interval are proved to be clustering centers by theory analysis and simulations. Finally, according to these clustering centers, a time scan clustering is designed for the rest of the points by density to complete the clustering. Density radius is a sensible parameter in calculating density for each data point, mountain climbing algorithm is thus used to realize self-adaptive density radius. Abundant typical benchmark data sets are testified to evaluate the performance of the brought up algorithms compared with other clustering algorithms in both aspects of clustering quality and time complexity. (C) 2017 Published by Elsevier B.V.

引用

页码：539 / 555

页数：17

共 29 条

[1]

A k-mean clustering algorithm for mixed numeric and categorical data [J].