A framework for evaluating privacy preserving data mining algorithms

被引:80
作者
Bertino, E [1 ]
Fovino, IN
Provenza, LP
机构
[1] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
[2] Purdue Univ, CS Dept, W Lafayette, IN 47907 USA
[3] Univ Milan, Dipartimento Informat & Comunicaz, I-20122 Milan, Italy
关键词
Clustering techniques - Itemsets - Privacy protection - Privacy-preserving data mining (PPDM);
D O I
10.1007/s10618-005-0006-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods, instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated. It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements. In this paper, we present a first evaluation framework for estimating and comparing different kinds of PPDM algorithms. Then, we apply our criteria to a specific set of algorithms and discuss the evaluation results we obtain. Finally, some considerations about future work and promising directions in the context of privacy preservation in data mining are discussed.
引用
收藏
页码:121 / 154
页数:34
相关论文
共 27 条
  • [1] Agrawal D., 2001, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, P247, DOI DOI 10.1145/375551.375602
  • [2] [Anonymous], 2002, Proceedings of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI DOI 10.1145/775047.775080
  • [3] [Anonymous], 2000, Privacy-preserving data mining, DOI DOI 10.1145/342009.335438
  • [4] MODELING DATA AND PROCESS QUALITY IN MULTI-INPUT, MULTI-OUTPUT INFORMATION-SYSTEMS
    BALLOU, DP
    PAZER, HL
    [J]. MANAGEMENT SCIENCE, 1985, 31 (02) : 150 - 162
  • [5] Benn S., 1984, Philosophical dimensions of privacy: an anthology
  • [6] Domingo-Ferrer J., 2002, CONFIDENTIALITY DISC, P113
  • [7] Duncan G.T., 2001, 121 NAT I STAT SCI
  • [8] Dwork C, 2004, LECT NOTES COMPUT SC, V3152, P528
  • [9] Evfimievski, 2002, SIGKDD EXPLORATIONS, V4, P43, DOI DOI 10.1145/772862.772869
  • [10] KANTARCIOGLU M, 2002, ACM SIGMOD WORKSH RE, P24