Methods for evaluating and creating data quality

被引：57

作者：

Winkler, WE ^{[1
]}

机构：

[1] US Bur Census, Div Stat Res, Washington, DC 20233 USA

来源：

INFORMATION SYSTEMS | 2004年 / 29卷 / 07期

关键词：

integer programming; set covering; data cleaning; approximate string comparison; unsupervised and supervised learning;

D O I：

10.1016/j.is.2003.12.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper provides a survey of two classes of methods that can be used in determining and improving the quality of individual files or groups of files. The first are edit/imputation methods for maintaining business rules and for imputing for missing data. The second are methods of data cleaning for finding duplicates within files or across files. Published by Elsevier Ltd.

引用

页码：531 / 550

页数：20

共 75 条

[1] ANANTHAKRISHNA R, 2003, VERY LARGE DATA BASE
[2] [Anonymous], ENTERPRISE KNOWLEDGE
[3] [Anonymous], BUSINESS SURVEY METH
[4] BARCAROLI G, 1997, STAT DATA EDITING, V2
[5] BARCAROLI G, 1993, P 49 SESS INT STAT I
[6] BELIN TR, 1995, J AM STAT ASSOC, V90, P694
[7] BERTOLAZZI P, 2003, IEEE WORKSH DAT QUAL
[8] BORTHWICK A, 2002, MEDD 2 0 C PRES NEW
[9] Bruni R., 2001, LOGIC OPTIMIZATION T
[10] Burkard R., 1980, Assignment and matching problems: Solution methods with FORTRAN programs

← 1 2 3 4 5 6 7 8 →