Supporting top-k join queries in relational databases

被引：102

作者：

Ilyas, IF ^{[1
]}

Aref, WG

Elmagarmid, AK

机构：

[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada

[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA

来源：

VLDB JOURNAL | 2004年 / 13卷 / 03期

关键词：

ranking; top-k queries; rank aggregation; query operators;

D O I：

10.1007/s00778-004-0128-2

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

引用

页码：207 / 221

页数：15

共 22 条

[1] [Anonymous], 2000, IEEE DATA ENG B
[2] BRUNO N, 2002, ACM T DATABASE SYST, V27, P369
[3] BRUNO N, 2002, P IEEE 18 INT C DAT, P153
[4] Carey M. J., 1997, SIGMOD Record, V26, P219, DOI 10.1145/253262.253302
[5] Carey M. J., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P158
[6] Chang K. C.-C., 2002, PROC ACM SIGMOD INT, P346
[7] SPEARMANS FOOTRULE AS A MEASURE OF DISARRAY
DIACONIS, P
GRAHAM, RL
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (02): : 262 - 268
[8] Diaconis P., 1988, IMS LECT SERIES, V11
[9] Dwork C., 2001, P 10 INT C WORLD WID, P613, DOI DOI 10.1145/371920.372165
[10] Optimal aggregation algorithms for middleware
Fagin, R
Lotem, A
Naor, M
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) : 614 - 656

← 1 2 3 →