The Space Complexity of Pass-Efficient Algorithms for Clustering

被引:5
作者
Chang, Kevin L. [1 ]
Kannan, Ravi [1 ]
机构
[1] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
来源
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS | 2006年
关键词
D O I
10.1145/1109557.1109685
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present multiple pass streaming algorithms for a basic clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most e using (O) over tilde (k(3)/epsilon(2)/(l)) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any e pass randomized algorithm that are nearly matched by our upper bounds. To the best of our knowledge, this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computation. In this problem, we are given a set of n points drawn randomly according to a mixture of k uniform distributions and wish to approximate the density function of the mixture. The points are placed in a datastream (possibly in adversarial order), which may only be read sequentially by the algorithm. We argue that this models, among others, the datastream produced by a national census of the incomes of all citizens.
引用
收藏
页码:1157 / 1166
页数:10
相关论文
共 19 条
[1]  
Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
[2]  
[Anonymous], 2003, P 35 ANN ACM S THEOR, DOI DOI 10.1145/780542.780548
[3]  
ARORA S, 2001, P 33 ANN ACM S THEOR, P247
[4]  
Bar-Yossef Z, 2002, ANN IEEE SYMP FOUND, P209, DOI 10.1109/SFCS.2002.1181944
[5]  
CHARIKAR M, 1997, P 29 ANN ACM S THEOR, P626, DOI DOI 10.1145/258533.258657
[6]  
Chaudhuri S., 1998, SIGMOD Record, V27, P436, DOI 10.1145/276305.276343
[7]  
Dasgupta S., 1999, P 40 ANN S FDN COMP, P634, DOI DOI 10.1109/SFFCS.1999.814639
[8]  
Drineas P, 2003, SIAM PROC S, P223
[9]  
Feigenbaum J, 2005, PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P745
[10]  
Gibbons PB, 1997, PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, P466