On unifying privacy and uncertain data models

被引:20
作者
Aggarwal, Charu C. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
来源
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2008年
关键词
D O I
10.1109/ICDE.2008.4497447
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of privacy-preserving data mining has been studied extensively in recent years because of the increased amount of personal information which is available to corporations and individuals. Most privacy transformations use some form of data perturbation or representational ambiguity in order to reduce the risk of identification. The final results from privacy transformation methods often require the underlying applications to be modified in order to work with the new representation of the data. Since the end results of privacy-transformation methods have not been standardized, the required modifications may vary with the method used for the privacy transformation. In some cases, it can be an enormous effort to re-design applications to work with the anonymized data. While the results of privacy-transformation methods are a natural form of uncertain data, the two problems have generally been studied independently. In this paper, we make a first attempt to unify the two fields, and propose a privacy transformation for which existing uncertain data management tools can be directly used. This is a great advantage, since it means that the wide spectrum of research available for uncertain data management can also be used for privacy-preserving data mining. We propose an uncertain version of the kappa-anonymity model which is related to the well known deterministic model of kappa-anonymity. The uncertain version of the kappa-anonymity model has the additional feature of introducing greater uncertainty for the adversary over an equivalent deterministic model. As specific instantiations of this approach, we test the effectiveness of the privacy transformation on the problems of query estimation and classification, and show that the technique retains greater accuracy than other kappa-anonymity models.
引用
收藏
页码:386 / 395
页数:10
相关论文
共 13 条
[1]  
AGGARWAL CC, 2004, EDBT, P183
[2]  
Agrawal D., 2001, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, P247, DOI DOI 10.1145/375551.375602
[3]  
AGRAWAL R, 2000, ACM SIGMOD C P, P70
[4]  
[Anonymous], 2006, P 2006 ACM SIGMOD IN, DOI DOI 10.1145/1142473.1142500
[5]  
BURDICK D, 2005, VLDB C P, P123
[6]  
KIM J, 1997, BUREAU CENSUS
[7]  
KRIEGEL HP, 2005, ACM KDD C P, P672
[8]   ProbView: A flexible probabilistic database system [J].
Lakshmanan, LVS ;
Leone, N ;
Ross, R ;
Subrahmanian, VS .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1997, 22 (03) :419-469
[9]  
Machanavajjhala A., 2006, ICDE, P24, DOI [DOI 10.1145/1217299.1217302, 10.1109/ICDE.2006.1, DOI 10.1109/ICDE.2006.1]
[10]   Aggregation of imprecise and uncertain information in databases [J].
McClean, S ;
Scotney, B ;
Shapcott, M .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (06) :902-912