Clustering data streams

被引:272
作者
Guha, S [1 ]
Mishra, N [1 ]
Motwani, R [1 ]
O'Callaghan, L [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS | 2000年
关键词
D O I
10.1109/SFCS.2000.892124
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of: memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the Ic-Median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense.
引用
收藏
页码:359 / 366
页数:8
相关论文
共 17 条
  • [1] [Anonymous], 1999, 40 ANN S FDN COMP SC
  • [2] Charikar M., 1999, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, P1, DOI 10.1145/301250.301257
  • [3] CHARIKAR M, 1997, P 29 ANN ACM S THEOR
  • [4] FEIGENBAUM J, 1999, P 40 ANN S FDN COMP, P501
  • [5] FRIEZE A, 1998, P 39 ANN IEEE S FDN
  • [6] HENZINGER MR, 1998, 1998011 DIG EQ CORP
  • [7] Indyk P., 1999, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, P428, DOI 10.1145/301250.301366
  • [8] Jain K., 1999, P NDSS, P1
  • [9] KANN V, HARDNESS APPROXIMATI
  • [10] Randomized query processing in robot path planning
    Kavraki, LE
    Latombe, JC
    Motwani, R
    Raghavan, P
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1998, 57 (01) : 50 - 60