Sampling the Repairs of Functional Dependency Violations under Hard Constraints

被引:66
作者
Beskales, George [1 ]
Ilyas, Ihab F. [1 ]
Golab, Lukasz [2 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] AT&T Labs Res, Red Bank, NJ USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2010年 / 3卷 / 01期
关键词
Data cleaning - Exponential numbers - Functional dependency - Hard constraints - Integrity constraints - Web data extraction;
D O I
10.14778/1920841.1920870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violations of functional dependencies (FDs) are common in practice, often arising in the context of data integration or Web data extraction. Resolving these violations is known to be challenging for a variety of reasons, one of them being the exponential number of possible "repairs". Previous work has tackled this problem either by producing a single repair that is (nearly) optimal with respect to some metric, or by computing consistent answers to selected classes of queries without explicitly generating the repairs. In this paper, we propose a novel data cleaning approach that is not limited to finding a single repair or to a particular class of queries, namely, sampling from the space of possible repairs. We give several motivating scenarios where sampling from the space of FD repairs is desirable, propose a new class of useful repairs, and present an algorithm that randomly samples from this space. We also show how to restrict the space of generated repairs based on user-defined hard constraints that define an immutable trusted subset of the input relation, and we experimentally evaluate our algorithm against previous approaches. While this paper focuses on repairing FDs, we envision the proposed sampling approach to be applicable to other integrity constraints with large repair spaces.
引用
收藏
页码:197 / 207
页数:11
相关论文
共 15 条
[1]  
Abiteboul Serge, 1995, FDN DATABASES
[2]  
Afrati F.N., 2009, P INT C DAT THEOR IC, P31
[3]  
Arenas M., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P68, DOI 10.1145/303976.303983
[4]  
Benjelloun O., 2006, P 32 INT C VER LARG, P953
[5]  
Bohannon P., 2005, P ACM SIGMOD INT C M, P143
[6]   Minimal-change integrity maintenance using tuple deletions [J].
Chomicki, J ;
Marcinkowski, J .
INFORMATION AND COMPUTATION, 2005, 197 (1-2) :90-121
[7]  
Chomicki J., 2004, PROC 13 ACM INT C IN, P417, DOI [10.1145/1031171, DOI 10.1145/1031171]
[8]   First-order query rewriting for inconsistent databases [J].
Fuxman, Ariel ;
Miller, Renee J. .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2007, 73 (04) :610-635
[9]  
Greco S, 2008, LECT NOTES COMPUT SC, V5231, P311, DOI 10.1007/978-3-540-87877-3_23
[10]  
Jampani R., 2008, P P 2008 ACM SIGMOD, P687