Practical data-oriented microaggregation for statistical disclosure control

被引:334
作者
Domingo-Ferrer, J
Mateo-Sanz, JM
机构
[1] Univ Rovira & Virgili, Dept Comp Sci, E-43006 Tarragona, Catalonia, Spain
[2] Univ Rovira & Virgili, Dept Chem Engn, Stat & OR Grp, E-43006 Tarragona, Catalonia, Spain
关键词
statistical databases; microdata protection; statistical disclosure control; microaggregation; hierarchical clustering; genetic algorithms;
D O I
10.1109/69.979982
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microaggregation is a statistical disclosure control technique for microdata disseminated in statistical databases, Raw microdata (i.e., individual records or data vectors) are grouped into small aggregates prior to publication. Each aggregate should contain at least k data vectors to prevent disclosure of individual information, where k is a constant value preset by the data protector. No exact polynomial algorithms are known to date to microaggregate optimally, i.e., with minimal variability loss. Methods in the literature rank data and partition them into groups of fixed-size; in the multivariate case, ranking is performed by projecting data vectors onto a single axis. In this paper, candidate optimal solutions to the multivariate and univariate microaggregation problems are characterized, In the univariate case, two heuristics based on hierarchical clustering and genetic algorithms are introduced which are data-oriented in that they try to preserve natural data aggregates, In the multivariate case, fixed-size and hierarchical clustering microaggregation algorithms are presented which do not require data to be projected onto a single dimension; such methods clearly reduce variability loss as compared to conventional multivariate microaggregation on projected data.
引用
收藏
页码:189 / 201
页数:13
相关论文
共 24 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
[Anonymous], 1982, CRYPTOGRAPHY DATA SE, DOI DOI 10.5555/539308
[4]  
ANWAR N, 1996, MICROAGGREGATION SMA
[5]   DISCLOSURE CONTROL OF MICRODATA [J].
BETHLEHEM, JG ;
KELLER, WJ ;
PANNEKOEK, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (409) :38-45
[6]  
Brucker P., 1978, LECTURE NOTES EC MAT, V157, P45
[7]  
Defays D., 1995, P 2 INT S STAT CONF, P69
[8]  
Defays D., 1993, P 92 S DES AN LONG S, P195
[9]  
DUNCAN GT, 1986, J AM STAT ASSOC, V81, P10, DOI 10.2307/2287959
[10]   A METHOD FOR CLUSTER ANALYSIS [J].
EDWARDS, AWF ;
CAVALLIS.LL .
BIOMETRICS, 1965, 21 (02) :362-&