Deterministic Sampling and Range Counting in Geometric Data Streams

被引:16
作者
Bagchi, Amitabha [1 ]
Chaudhary, Amitabh [2 ]
Eppstein, David [3 ]
Goodrich, Michael T. [3 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Hauz Khas, New Delhi 110016, India
[2] Notre Dame Univ, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[3] Univ Calif Irvine, Sch Informat & Comp Sci, Irvine, CA 92697 USA
关键词
Data streams; streaming algorithms; geometric data; sampling; robust statistics; epsilon nets; iceberg queries; range counting;
D O I
10.1145/1240233.1240239
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are at least inverse-polylogarithmic. We also include a lower bound for noniceberg geometric queries.
引用
收藏
页数:18
相关论文
共 46 条
[21]  
Greenwald M, 2001, SIGMOD RECORD, V30, P58
[22]   Clustering data streams [J].
Guha, S ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366
[23]  
Gupta A, 2003, SIAM PROC S, P253
[24]  
HAR-PELED S., 2003, LECT NOTES COMPUTER
[25]  
Hershberger John, 2003, P ACM DIMACS WORKSH
[26]  
Indyk P, 2003, SIAM PROC S, P539
[27]  
Indyk P., 2003, P ACM DIMACS WORKSH
[28]  
Jadhav S., 1993, P 9 ANN S COMP GEOM, P83
[29]   A simple algorithm for finding frequent elements in streams and bags [J].
Karp, RM ;
Shenker, S ;
Papadimitriou, CH .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2003, 28 (01) :51-55
[30]  
Korn F., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P814