A hybrid approach for scalable sub-tree anonymization over big data using Map Reduce on cloud

被引：78

作者：

Zhang, Xuyun ^{[1
]}

Liu, Chang ^{[1
]}

Nepal, Surya ^{[2
]}

Yang, Chi ^{[1
]}

Dou, Wanchun ^{[3
]}

Chen, Jinjun ^{[1
]}

机构：

[1] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia

[2] CSIRO, Ctr Informat & Commun Technol, Marsfield, NSW 2122, Australia

[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China

来源：

JOURNAL OF COMPUTER AND SYSTEM SCIENCES | 2014年 / 80卷 / 05期

关键词：

Big data; Cloud computing; Data anonymization; Privacy preservation; MapReduce;

D O I：

10.1016/j.jcss.2014.02.007

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. (c) 2014 Elsevier Inc. All rights reserved.

引用

页码：1008 / 1020

页数：13

共 29 条

[1] [Anonymous], 2006, P 32 INT C VER LARG
[2] [Anonymous], 2005, P 2005 ACM SIGMOD IN
[3] Chaudhuri Surajit., 2012, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, P1
[4] MapReduce: A Flexible Data Processing Tool
Dean, Jeffrey
Ghemawat, Sanjay
[J]. COMMUNICATIONS OF THE ACM, 2010, 53 (01) : 72 - 77
[5] Ene A., 2011, P 17 ACM KDD, P681, DOI DOI 10.1145/2020408.2020515
[6] Anonymizing classification data for privacy preservation
Fung, Benjamin C. M.
Wang, Ke
Yu, Philip S.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) : 711 - 725
[7] Privacy-Preserving Data Publishing: A Survey of Recent Developments
Fung, Benjamin C. M.
Wang, Ke
Chen, Rui
Yu, Philip S.
[J]. ACM COMPUTING SURVEYS, 2010, 42 (04)
[8] Privacy-preserving data publishing for cluster analysis
Fung, Benjamin C. M.
Wang, Ke
Wang, Lingyu
Hung, Patrick C. K.
[J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (06) : 552 - 575
[9] Iwuchukwu T., 2007, VLDB 07, P746
[10] LeFevre K., 2006, P IEEE ICDE, P25

← 1 2 3 →