Compressed histograms with arbitrary bucket layouts for selectivity estimation

被引:11
作者
Fuchs, Dennis
He, Zhen
Lee, Byung Suk
机构
[1] La Trobe Univ, Dept Comp Sci & Comp Engn, Bundoora, Vic 3086, Australia
[2] Tele Atlas, Lebanon, NH 03766 USA
[3] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
基金
美国国家科学基金会;
关键词
selectivity; histogram; compression; self-tuning;
D O I
10.1016/j.ins.2006.07.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Selectivity estimation is an important step of query optimization in a database management system, and multi-dimensional histogram techniques have proved promising for selectivity estimation. Recent multi-dimensional histogram techniques such as GenHist and STHoles use an arbitrary bucket layout. This layout has the advantage of requiring a smaller number of buckets to model tuple densities than those required by the traditional grid or recursive layouts. However, the arbitrary bucket layout brings an inherent disadvantage of requiring more memory to store each bucket location information. This diminishes the advantage of requiring fewer buckets and, therefore, has an adverse effect on the resulting selectivity estimation accuracy. To our knowledge, however, no existing histogram-based technique with arbitrary layout addresses this issue. In this paper, we introduce the idea of bucket location compression and then demonstrate its effectiveness for improving selectivity estimation accuracy by proposing the STHoles+ technique. STHoles+ extends STHoles by quantizing each coordinate of a bucket relative to the coordinate of the smallest enclosing bucket. This quantization increases the number of histogram buckets that can be stored in the histogram. Our quantization scheme allows STHoles+ to trade precision of histogram bucket locations for storing more buckets. Experimental results show that STHoles+ outperforms STHoles on various data distributions, query distributions, and other factors such as available memory size, quantization resolution, and dimensionality of the data space. (C) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:680 / 702
页数:23
相关论文
共 18 条
[1]
Aboulnaga A, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P181, DOI 10.1145/304181.304198
[2]
Acharya S, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P13, DOI 10.1145/304181.304184
[3]
[Anonymous], P ACM SIGMOD INT C M
[4]
Bruno N., 2001, SIGMOD RECORD, P211
[5]
CHEN CM, 1994, P ACM SIGMOD C MAY, P161
[6]
DESHPANDE A, 2001, P 2001 ACM SIGMOD IN, P199
[7]
Fast incremental maintenance of approximate histograms [J].
Gibbons, PB ;
Matias, Y ;
Poosala, V .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (03) :261-298
[8]
GUNOPULOS D, 2000, P ACM SIGMOD, P463
[9]
IOANNIDIS Y, 1991, P ACM SIGMOD C, P268
[10]
LIPTON RJ, 1990, PROCEEDINGS OF THE NINTH ACM SIGACT-SIGMOD-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, P40, DOI 10.1145/298514.298540