Probabilistic top-k and ranking-aggregate queries

被引:46
作者
Soliman, Mohamed A. [1 ]
Ilyas, Ihab F. [1 ]
Chang, Kevin Chen-Chuan [2 ]
机构
[1] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; design; experimentation; performance; query processing; probabilistic data; top-k; ranking; aggregation;
D O I
10.1145/1386118.1386119
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ranking and aggregation queries are widely used in data exploration, data analysis, and decision-making scenarios. While most of the currently proposed ranking and aggregation techniques focus on deterministic data, several emerging applications involve data that is unclean or uncertain. Ranking and aggregating uncertain (probabilistic) data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, uncertainty imposes probability as a new ranking dimension that does not exist in the traditional settings. In this article we introduce new probabilistic formulations for top-k and ranking-aggregate queries in probabilistic databases. Our formulations are based on marriage of traditional top-k semantics with possible worlds semantics. In the light of these formulations, we construct a generic processing framework supporting both query types, and leveraging existing query processing and indexing capabilities in current RDBMSs. The framework encapsulates a state space model and efficient search algorithms to compute query answers. Our proposed techniques minimize the number of accessed tuples and the size of materialized search space to compute query answers. Our experimental study shows the efficiency of our techniques under different data distributions with orders of magnitude improvement over naive methods.
引用
收藏
页数:54
相关论文
共 35 条
[1]  
Abiteboul S., 1987, SIGMOD Record, V16, P34, DOI 10.1145/38714.38724
[2]  
Andritsos Periklis, 2006, P ICDE C, P30
[3]  
[Anonymous], 2007, PROC IEEE 23 INT C D
[4]  
[Anonymous], P 1997 ACM SIGMOD IN
[5]  
[Anonymous], 2006, VLDB
[6]  
Bhattacharya I., 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, P529
[7]   Evaluation of probabilistic queries over imprecise data in constantly-evolving environments [J].
Cheng, Reynold ;
Kalashnikov, Dmitri V. ;
Prabhakar, Sunil .
INFORMATION SYSTEMS, 2007, 32 (01) :104-130
[8]   Efficient query evaluation on probabilistic databases [J].
Dalvi, Nilesh ;
Suciu, Dan .
VLDB JOURNAL, 2007, 16 (04) :523-544
[9]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[10]  
FUXMAN A, 2005, P ACM SIGMOD INT C M, P155