Selectivity and cost estimation for joins based on random sampling

被引:76
作者
Haas, PJ
Naughton, JF
Seshadri, S
Swami, AN
机构
[1] UNIV WISCONSIN, DEPT COMP SCI, MADISON, WI 53706 USA
[2] INDIAN INST TECHNOL, DEPT COMP SCI & ENGN, BOMBAY 400076, MAHARASHTRA, INDIA
[3] SILICON GRAPH COMP SYST, MT VIEW, CA 94043 USA
基金
美国国家科学基金会;
关键词
D O I
10.1006/jcss.1996.0041
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
We compare the performance of sampling-based procedures for estimating the selectivity of a join. While some of the procedures have been proposed in the database literature, their relative performance has never been analyzed. A main result of this paper is a partial ordering that compares the variability of the estimators for the different procedures after an arbitrary fixed number of sampling steps. Prior to the current work, it was also unknown whether these fixed-step procedures could be extended to fixed-precision procedures that are both asymptotically consistent and asymptotically efficient.: Our second main result is a general method for such an extension and a proof that the method is valid for all the procedures under consideration. We show that, under plausible assumptions on sampling costs, the partial ordering of the fixed-step procedures with respect to variability of the selectivity estimator implies a partial ordering of the corresponding fixed-precision procedures with respect to sampling cost. Our final result is a collection of fixed-step and fixed-precision procedures for estimating the cost of processing a join query according to a fixed join plan. (C) 1996 Academic Press, Inc.
引用
收藏
页码:550 / 569
页数:20
相关论文
共 34 条
[1]
LARGE-SAMPLE THEORY OF SEQUENTIAL ESTIMATION [J].
ANSCOMBE, FJ .
PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1952, 48 (04) :600-607
[2]
Billingsley P., 1986, PROBABILITY MEASURE
[3]
ON THE ASYMPTOTIC THEORY OF FIXED-WIDTH SEQUENTIAL CONFIDENCE-INTERVALS FOR THE MEAN [J].
CHOW, YS ;
ROBBINS, H .
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (02) :457-462
[4]
Cochran W.G., 2007, SAMPLING TECHNIQUES
[5]
DEWITT DJ, 1992, PROC INT CONF VERY L, P27
[6]
DEWITT DJ, 1984, P ACM SIGMOD INT C M, P1
[7]
QUERY EVALUATION TECHNIQUES FOR LARGE DATABASES [J].
GRAEFE, G .
COMPUTING SURVEYS, 1993, 25 (02) :73-170
[8]
Gut, 1988, STOPPED RANDOM WALKS
[9]
HAAS LM, 1993, SEEKING TRUTH ADHOC
[10]
Haas P. J., 1994, Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1994, P14, DOI 10.1145/182591.182594