SNCStream+ : Extending a high quality true anytime data stream clustering algorithm

被引:15
作者
Barddal, Jean Paul [1 ]
Gomes, Heitor Murilo [1 ]
Enembreck, Fabricio [1 ]
Barthes, Jean-Paul [2 ]
机构
[1] Pontificia Univ Catolica Parana, Programa Posgrad Informat, Curitiba, Parana, Brazil
[2] UTC, Compiegne, France
关键词
Data stream clustering; Unsupervised learning; Social networks theory;
D O I
10.1016/j.is.2016.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Data Stream Clustering is an active area of research which requires efficient algorithms capable of finding and updating clusters incrementally as data arrives. On top of that, due to the inherent evolving nature of data streams, it is expected that algorithms undergo both concept drifts and evolutions, which must be taken into account by the clustering algorithm, allowing incremental clustering updates. In this paper we present the Social Network Clusterer Stream(+) (SNCStream(+)). SNCStream(+) tackles the data stream clustering problem as a network formation and evolution problem, where instances and micro clusters form clusters based on homophily. Our proposal has its parameters analyzed and it is evaluated in a broad set of problems against literature baselines. Results show that SNCStream+ achieves superior clustering quality (CMM), and feasible processing time and memory space usage when compared to the original SNCStream and other proposals of the literature. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:60 / 73
页数:14
相关论文
共 34 条
[1]
Clustering for Metric and Nonmetric Distance Measures [J].
Ackermann, Marcel R. ;
Bloemer, Johannes ;
Sohler, Christian .
ACM TRANSACTIONS ON ALGORITHMS, 2010, 6 (04)
[2]
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[3]
Aggarwal CC, 2006, LECT NOTES ARTIF INT, V4198, P139
[4]
Statistical mechanics of complex networks [J].
Albert, R ;
Barabási, AL .
REVIEWS OF MODERN PHYSICS, 2002, 74 (01) :47-97
[5]
On Density-Based Data Streams Clustering Algorithms: A Survey [J].
Amini, Amineh ;
Teh, Ying Wah ;
Saboohi, Hadi .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (01) :116-141
[6]
[Anonymous], 1963, DISTRIBUTION EREE MU
[7]
[Anonymous], 2003, P 29 INT C VER LARG
[8]
[Anonymous], 2010, GRAPH THEORY COMPLEX
[9]
SNCStream: A Social Network-based Data Stream Clustering Algorithm [J].
Barddal, Jean Paul ;
Gomes, Heitor Murilo ;
Enembreck, Fabricio .
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, :935-940
[10]
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217