A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

被引:63
作者
Mukherjee, Shibnath [1 ]
Chen, Zhiyuan [1 ]
Gangopadhyay, Aryya [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
关键词
privacy; data mining; Fourier transform;
D O I
10.1007/s00778-006-0010-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy preserving data mining has become increasingly popular because it allows sharing of privacy-sensitive data for analysis purposes. However, existing techniques such as random perturbation do not fare well for simple yet widely used and efficient Euclidean distance-based mining algorithms. Although original data distributions can be pretty accurately reconstructed from the perturbed data, distances between individual data points are not preserved, leading to poor accuracy for the distance-based mining methods. Besides, they do not generally focus on data reduction. Other studies on secure multi-party computation often concentrate on techniques useful to very specific mining algorithms and scenarios such that they require modification of the mining algorithms and are often difficult to generalize to other mining algorithms or scenarios. This paper proposes a novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy. Three algorithms to select the most important transform coefficients are presented, one for a centralized database case, the second one for a horizontally partitioned, and the third one for a vertically partitioned database case. Experimental results demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:293 / 315
页数:23
相关论文
共 41 条
  • [1] AGARWAL R, 2000, 2000 ACM SIGMOD C MA, P439
  • [2] AGGARWAL CC, 2004, CONDENSATION APPROAC, P183
  • [3] Aggarwal G, 2005, LECT NOTES COMPUT SC, V3363, P246
  • [4] Agrawal D., 2001, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, P247, DOI DOI 10.1145/375551.375602
  • [5] Agrawal R., 1993, Foundations of Data Organization and Algorithms. 4th International Conference. FODO '93 Proceedings, P69
  • [6] AGRAWAL S, 2005, FRAMEWORK HIGH ACCUR, P193
  • [7] [Anonymous], 2002, Proceedings of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI DOI 10.1145/775047.775080
  • [8] [Anonymous], 1998, UCI REPOSITORY MACHI
  • [9] [Anonymous], P WORKSH PRIV SEC AS
  • [10] [Anonymous], 2003, ACM Symposium on Principles of Database Systems, DOI DOI 10.1145/773153.773174