Top-k selection queries over relational databases:: Mapping strategies and performance evaluation

被引:109
作者
Bruno, N
Chaudhuri, S
Gravano, L
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Microsoft Corp, Redmond, WA 98052 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2002年 / 27卷 / 02期
关键词
algorithms; measurement; performance; multidimensional histograms; top-k query processing;
D O I
10.1145/568518.568519
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a top-k query by translating it into a single range query that a traditional relational database management system (RDBMS) can process efficiently. In particular, we study how to determine a range query to evaluate a top-k query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme. We also report the first experimental evaluation of the mapping strategies over a real RDBMS, namely over Microsoft's SQL Server 7.0. The experiments show that our new techniques are robust and significantly more efficient than previously known strategies requiring at least one sequential scan of the data sets.
引用
收藏
页码:153 / 187
页数:35
相关论文
共 34 条
  • [1] Aboulnaga A, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P181, DOI 10.1145/304181.304198
  • [2] [Anonymous], DATABASE THEORY
  • [3] AVNUR R, 1998, P 1998 ACM INT C MAN
  • [4] BELUSSI A, 1995, P 21 INT C VER LARG
  • [5] Blake C.L., 1998, UCI repository of machine learning databases
  • [6] BRUNO N, 2001, P 2001 ACM SIGMOD IN
  • [7] BRUNO N, 2000, CUCS02100
  • [8] CAREY MJ, 1997, P 1997 ACM INT C MAN
  • [9] CAREY MJ, 1998, P 24 INT C VER LARG
  • [10] Chaudhuri S, 1997, PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES, P146