Qualitative Data Cleaning

被引:22
作者
Chu, Xu [1 ]
Ilyas, Ihab F. [1 ]
机构
[1] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 13期
关键词
D O I
10.14778/3007263.3007320
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation scripts or by involving human experts, and sometimes both. In this tutorial, we discuss the main facets and directions in designing qualitative data cleaning techniques. We present a taxonomy of current qualitative error detection techniques, as well as a taxonomy of current data repairing techniques. We will also discuss proposals for tackling the challenges for cleaning "big data" in terms of scale and distribution.
引用
收藏
页码:1605 / 1608
页数:4
相关论文
共 35 条
[1]  
Aggarwal C. C, 2013, OUTLIER ANAL
[2]  
Ananthakrishna R., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P586
[3]  
Beskales G., 2009, P VLDB ENDOWMENT, V2, P598
[4]  
Beskales G, 2013, PROC INT CONF DATA, P541, DOI 10.1109/ICDE.2013.6544854
[5]   Sampling the Repairs of Functional Dependency Violations under Hard Constraints [J].
Beskales, George ;
Ilyas, Ihab F. ;
Golab, Lukasz .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01) :197-207
[6]  
Bohannon P., 2005, P ACM SIGMOD INT C M, P143
[7]  
Bohannon P., 2007, ICDE, P746, DOI DOI 10.1109/ICDE.2007.367920
[8]   Descriptive and Prescriptive Data Cleaning [J].
Chalamalla, Anup ;
Ilyas, Ihab F. ;
Ouzzani, Mourad ;
Papotti, Paolo .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :445-456
[9]  
Chawla S., 2006, PAKDD
[10]  
Chiang F, 2011, PROC INT CONF DATA, P446, DOI 10.1109/ICDE.2011.5767833