A Framework for Evaluating Privacy Preserving Data Mining Algorithms*

被引:4
作者
Elisa Bertino
Igor Nai Fovino
Loredana Parasiliti Provenza
机构
[1] Purdue University,CERIAS and CS Department
[2] Università degli Studi di Milano,Dipartimento di Informatica e Comunicazione
来源
Data Mining and Knowledge Discovery | 2005年 / 11卷
关键词
Association Rule; Privacy Preserve; Minimum Support Threshold; Disclosure Risk; Privacy Level;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods, instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated. It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements.
引用
收藏
页码:121 / 154
页数:33
相关论文
共 20 条
[1]  
Ballou D.(1985)Modelling data and process quality in multi input, multi output information systems Management science 31 150-62
[2]  
Pazer H.(2004)Privacy preserving data mining in vertically partitioned database Crypto 2004 3152 528-544
[3]  
Dwork C.(2002)Randomization in privacy preserving data mining SIGKDD Explor. Newsl. 4 43-48
[4]  
Nissim K.(1998)Examining data quality Communications of the ACM 41 54-57
[5]  
Evfimievski A.(1948)A mathematical theory of communication Bell System Technical Journal 27 379-423
[6]  
Kumar Tayi G.(1992)An information theoretic approach to rule induction from databases IEEE Transaction On Knowledge And Data Engineering 3 301-316
[7]  
Ballou D. P.(2002)Achieving k-anonymity privacy protection using generalization and suppression International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 571-588
[8]  
Shannon C.E.(2001)A decision-theoretic approach to data disclosure problems Research in Official Statistics 4 7-22
[9]  
Smyth P.(2004)State-of-the-art in privacy preserving data mining SIGMOD Record 33 50-57
[10]  
Goodman R. M.(1996)Beyond accuracy: what data quality means to data consumers Journal of Management Information Systems 12 5-34