Protecting privacy using k-anonymity

被引:179
作者
El Emam, Khaled [1 ,2 ]
Dankar, Fida Kamal [1 ]
机构
[1] Childrens Hosp Eastern Ontario, Res Inst, Ottawa, ON K1J 8L1, Canada
[2] Univ Ottawa, Fac Med, Ottawa, ON, Canada
关键词
D O I
10.1197/jamia.M2716
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets. Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets. Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric. Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity. Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.
引用
收藏
页码:627 / 637
页数:11
相关论文
共 101 条
  • [1] ABRAHAM K, 2005, ALLEGMEINES STAT ARC, V89, P121
  • [2] Bias due to aggregation of individual covariates in the cox regression model
    Abrahamowicz, M
    du Berger, R
    Krewski, D
    Burnett, R
    Bartlett, G
    Tamblyn, RM
    Leffondré, K
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2004, 160 (07) : 696 - 706
  • [3] AGGARWAL C, 2005, P 31 VLDB C
  • [4] Aggarwal G, 2005, J PRIV TECHNOL
  • [5] Authors should make their data available
    Altman, DG
    Cates, C
    [J]. BRITISH MEDICAL JOURNAL, 2001, 323 (7320): : 1069 - 1070
  • [7] [Anonymous], 2002, PRIV COD
  • [8] [Anonymous], 1997, REGRESSION MODELS CA
  • [9] [Anonymous], 2005, CANC CAR ONT DAT US
  • [10] Arzberger P., 2004, DATA SCI J, V3, P135, DOI DOI 10.2481/DSJ.3.135