Clustering data streams: Theory and practice

被引：404

作者：

Guha, S

Meyerson, A

Mishra, N

Motwani, R

O'Callaghan, L

机构：

[1] Univ Penn, Dept Comp Sci, Philadelphia, PA 19104 USA

[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA

[3] Hewlett Packard Labs, Palo Alto, CA 94304 USA

[4] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2003年 / 15卷 / 03期

基金：

美国国家科学基金会;

关键词：

clustering; data streams; approximation algorithms;

D O I：

10.1109/TKDE.2003.1198387

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, While using little memory, is crucial. We describe such a Streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.

引用

页码：515 / 528

页数：14

共 75 条

[1]

Achlioptas D., 2001, PROC 33 ACM S THEORY, P611

[2]

Agarwal PK, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P538

[3]

AGRAWAL R, 1998, P SIGMOD

[4]

Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823

[5]

Ankerst M, 1999, P SIGMOD

[6]

[Anonymous], 1999, 40 ANN S FDN COMP SC

[7]

[Anonymous], P 4 INT C KNOWL DISC

[8]

Arora S., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P106, DOI 10.1145/276698.276718

[9]

Arya V., 2001, 33 ACM S THEORY COMP, P21

[10]

Babcock B., 2002, P ACM S DISCR ALG

← 1 2 3 4 5 6 7 8 →