The threshold algorithm: From middleware systems to the relational engine

被引:13
作者
Bruno, Nicolas
Wang, Hui
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
关键词
query processing; relational databases; retrieval models; search process;
D O I
10.1109/TKDE.2007.1011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the Threshold Algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems.
引用
收藏
页码:523 / 537
页数:15
相关论文
共 17 条
[1]   RxW: A scheduling approach for large-scale on-demand data broadcast [J].
Aksoy, D ;
Franklin, M .
IEEE-ACM TRANSACTIONS ON NETWORKING, 1999, 7 (06) :846-860
[2]  
Blake C.L., 1998, UCI repository of machine learning databases
[3]   Top-k selection queries over relational databases:: Mapping strategies and performance evaluation [J].
Bruno, N ;
Chaudhuri, S ;
Gravano, L .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :153-187
[4]  
Bruno N., 2001, SIGMOD RECORD, P211
[5]  
Carey M. J., 1997, SIGMOD Record, V26, P219, DOI 10.1145/253262.253302
[6]  
Chang K. C.-C., 2002, PROC ACM SIGMOD INT, P346
[7]  
Chen CM, 2002, PROC INT CONF DATA, P617, DOI 10.1109/ICDE.2002.994779
[8]  
Fagin R., 1996, Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1996, P216, DOI 10.1145/237661.237715
[9]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[10]  
GUNOPULOS D, 2000, P ACM SIGMOD, P463