Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models

被引:53
作者
Kiefer, Martin [1 ]
Heimel, Max [2 ]
Bress, Sebastian [1 ,3 ]
Markl, Volker [1 ,3 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] Snowflake Comp, San Mateo, CA 94401 USA
[3] German Res Ctr Artificial Intelligence DFKI, Saarbrucken, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 10卷 / 13期
关键词
D O I
10.14778/3151106.3151112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operations. Kernel Density Estimation (KDE) is a statistical method to estimate multivariate probability distributions from a data sample. Previously, we introduced a modern, self-tuning selectivity estimator for range scans based on KDE that outperforms state-of-the-art multidimensional histograms and is efficient to evaluate on graphics cards. In this paper, we extend these bandwidth-optimized KDE models to estimate the result size of single and multiple joins. In particular, we propose two approaches: (1) Building a KDE model from a sample drawn from the join result. (2) Efficiently combining the information from base table KDE models. We evaluated our KDE-based join estimators on a variety of synthetic and real-world datasets, demonstrating that they are superior to state-of-the art join estimators based on sketching or sampling.
引用
收藏
页码:2085 / 2096
页数:12
相关论文
共 35 条
[1]
Alon N., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P10, DOI 10.1145/303976.303978
[2]
[Anonymous], 2004, SIGMOD
[3]
[Anonymous], 2015, GLARMA PACKAGE
[4]
[Anonymous], The nlopt nonlinear-optimization package
[5]
[Anonymous], 1993, THESIS
[6]
[Anonymous], 2009, Proc. VLDB Endow.
[7]
Robust Query Processing in Co-Processor-accelerated Databases [J].
Bress, Sebastian ;
Funke, Henning ;
Teubner, Jens .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :1891-1906
[8]
Bromiley PaulA., 2003, PRODUCTS CONVOLUTION
[9]
Bruno N, 2001, SIGMOD RECORD, V30, P211, DOI 10.1145/376284.375686
[10]
Chaudhuri S, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P263, DOI 10.1145/304181.304206