Streaming-data algorithms for high-quality clustering

被引:216
作者
O'Callaghan, L [1 ]
Mishra, N [1 ]
Meyerson, A [1 ]
Guha, S [1 ]
Motwani, R [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2002年
关键词
D O I
10.1109/ICDE.2002.994785
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algorithms that consume a small amount of memory arc critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.
引用
收藏
页码:685 / 694
页数:10
相关论文
共 27 条
[1]  
Ankerst M, 1999, P SIGMOD
[2]  
[Anonymous], 2006, PATTERN CLASSIFICATI
[3]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[4]  
BRADLEY PS, 1998, P 15 INT C MACH LEAR, P91
[5]  
CHARIKAR M, 1999, P FOCS
[6]  
Ester M., 1996, DENSITY BASED ALGORI
[7]  
FARNSTROM F, 2000, SIGKDD EXPL
[8]  
FEIGENBAUM J, 1999, APPROXIMATE 11 DIFFE
[9]  
Guha S., 1998, SIGMOD Record, V27, P73, DOI 10.1145/276305.276312
[10]   Clustering data streams [J].
Guha, S ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366