Methods for evaluating and creating data quality

被引:57
作者
Winkler, WE [1 ]
机构
[1] US Bur Census, Div Stat Res, Washington, DC 20233 USA
关键词
integer programming; set covering; data cleaning; approximate string comparison; unsupervised and supervised learning;
D O I
10.1016/j.is.2003.12.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper provides a survey of two classes of methods that can be used in determining and improving the quality of individual files or groups of files. The first are edit/imputation methods for maintaining business rules and for imputing for missing data. The second are methods of data cleaning for finding duplicates within files or across files. Published by Elsevier Ltd.
引用
收藏
页码:531 / 550
页数:20
相关论文
共 75 条
  • [1] ANANTHAKRISHNA R, 2003, VERY LARGE DATA BASE
  • [2] [Anonymous], ENTERPRISE KNOWLEDGE
  • [3] [Anonymous], BUSINESS SURVEY METH
  • [4] BARCAROLI G, 1997, STAT DATA EDITING, V2
  • [5] BARCAROLI G, 1993, P 49 SESS INT STAT I
  • [6] BELIN TR, 1995, J AM STAT ASSOC, V90, P694
  • [7] BERTOLAZZI P, 2003, IEEE WORKSH DAT QUAL
  • [8] BORTHWICK A, 2002, MEDD 2 0 C PRES NEW
  • [9] Bruni R., 2001, LOGIC OPTIMIZATION T
  • [10] Burkard R., 1980, Assignment and matching problems: Solution methods with FORTRAN programs