Density-Based Clustering over an Evolving Data Stream with Noise

被引:498
作者
Cao, Feng [1 ]
Ester, Martin [2 ]
Qian, Weining [1 ]
Zhou, Aoying [1 ]
机构
[1] Fudan Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
来源
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2006年
关键词
Data mining algorithms; Density based clustering; Evolving data streams;
D O I
10.1137/1.9781611972764.29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important task in mining evolving data streams. Beside the limited memory and one-pass constraints, the nature of evolving data streams implies the following requirements for stream clustering: no assumption on the number of clusters, discovery of clusters with arbitrary shape and ability to handle outliers. While a lot of clustering algorithms for data streams have been proposed, they offer no solution to the combination of these requirements. In this paper, we present Den Stream, a new approach for discovering clusters in an evolving data stream. The "dense" micro-cluster (named core-micro-cluster) is introduced to summarize the clusters with arbitrary shape, while the potential core-micro-cluster and outlier micro-cluster structures are proposed to maintain and distinguish the potential clusters and outliers. A novel pruning strategy is designed based on these concepts, which guarantees the precision of the weights of the micro-clusters with limited memory. Our performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method.
引用
收藏
页码:328 / +
页数:2
相关论文
共 18 条
[1]  
AGGARWAL CC, 2004, P VLDB
[2]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]  
[Anonymous], 2003, P 35 ANN ACM S THEOR, DOI DOI 10.1145/780542.780548
[4]  
[Anonymous], P VLDB
[5]  
[Anonymous], 2001, P 18 INT C MACH LEAR
[6]   Clustering on demand for multiple data streams [J].
Dai, BR ;
Huang, JW ;
Yeh, MY ;
Chen, MS .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :367-370
[7]  
Domingos P., 2000, P KDD
[8]  
Ester M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P323
[9]  
ESTER M, 1996, P KDD
[10]   Clustering data streams: Theory and practice [J].
Guha, S ;
Meyerson, A ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) :515-528