On static and dynamic methods for condensation-based privacy-preserving data mining

被引:46
作者
Aggarwal, Charu C. [1 ]
Yu, Philip S. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 01期
关键词
databases; algorithms; privacy; databases data mining; k-anonymity;
D O I
10.1145/1331904.1331906
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, privacy-preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy-preserving data mining of multidimensional data. Previous work for privacy-preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution-based algorithm for each data mining problem, since it does not use the multidimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy-preserving data mining that does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. These anonymized data closely match the characteristics of the original data including the correlations among the different dimensions. We will show how to extend the method to the case of data streams. We present empirical results illustrating the effectiveness of the method. We also show the efficiency of the method for data streams.
引用
收藏
页数:39
相关论文
共 40 条
  • [1] Aggarwal C.C., 2004, P INT C EXT DAT TECH
  • [2] AGGARWAL CC, 2005, P INT C VER LARG DAT
  • [3] AGRAWAL D, 2000, P ACM SIGACT SIGMOD
  • [4] AGRAWAL R, 2000, P ACM SIGMOD C
  • [5] Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
  • [6] AGRAWAL S, 2005, P INT C DAT ENG
  • [7] [Anonymous], P INT C DAT MIN
  • [8] [Anonymous], P IEEE INT C DAT ENG
  • [9] [Anonymous], P ACM SIGACT SIGMOD
  • [10] ATZORI M, 2008, IN PRES VLDB J