Deterministic Sampling and Range Counting in Geometric Data Streams

被引:16
作者
Bagchi, Amitabha [1 ]
Chaudhary, Amitabh [2 ]
Eppstein, David [3 ]
Goodrich, Michael T. [3 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Hauz Khas, New Delhi 110016, India
[2] Notre Dame Univ, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[3] Univ Calif Irvine, Sch Informat & Comp Sci, Irvine, CA 92697 USA
关键词
Data streams; streaming algorithms; geometric data; sampling; robust statistics; epsilon nets; iceberg queries; range counting;
D O I
10.1145/1240233.1240239
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are at least inverse-polylogarithmic. We also include a lower bound for noniceberg geometric queries.
引用
收藏
页数:18
相关论文
共 46 条
[41]  
Sauer N., 1972, Journal of Combinatorial Theory, Series A, V13, P145, DOI 10.1016/0097-3165(72)90019-2
[42]  
SEN PK, 1968, J AM STAT ASSOC, V63, P1379
[43]  
Singh Manku G., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P346
[44]  
Suri S., 2004, P 20 ANN S COMPUTATI, P160
[45]  
Theil H., 1950, RANK INVARIANT METHO, V53, P1397, DOI DOI 10.1007/978-94-011-2546-8_20
[46]   UNIFORM CONVERGENCE OF RELATIVE FREQUENCIES OF EVENTS TO THEIR PROBABILITIES [J].
VAPNIK, VN ;
CHERVONENKIS, AY .
THEORY OF PROBILITY AND ITS APPLICATIONS,USSR, 1971, 16 (02) :264-+