A hybrid approach for scalable sub-tree anonymization over big data using Map Reduce on cloud

被引:78
作者
Zhang, Xuyun [1 ]
Liu, Chang [1 ]
Nepal, Surya [2 ]
Yang, Chi [1 ]
Dou, Wanchun [3 ]
Chen, Jinjun [1 ]
机构
[1] Univ Technol Sydney, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
[2] CSIRO, Ctr Informat & Commun Technol, Marsfield, NSW 2122, Australia
[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
关键词
Big data; Cloud computing; Data anonymization; Privacy preservation; MapReduce;
D O I
10.1016/j.jcss.2014.02.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:1008 / 1020
页数:13
相关论文
共 29 条
  • [1] [Anonymous], 2006, P 32 INT C VER LARG
  • [2] [Anonymous], 2005, P 2005 ACM SIGMOD IN
  • [3] Chaudhuri Surajit., 2012, Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, P1
  • [4] MapReduce: A Flexible Data Processing Tool
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2010, 53 (01) : 72 - 77
  • [5] Ene A., 2011, P 17 ACM KDD, P681, DOI DOI 10.1145/2020408.2020515
  • [6] Anonymizing classification data for privacy preservation
    Fung, Benjamin C. M.
    Wang, Ke
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) : 711 - 725
  • [7] Privacy-Preserving Data Publishing: A Survey of Recent Developments
    Fung, Benjamin C. M.
    Wang, Ke
    Chen, Rui
    Yu, Philip S.
    [J]. ACM COMPUTING SURVEYS, 2010, 42 (04)
  • [8] Privacy-preserving data publishing for cluster analysis
    Fung, Benjamin C. M.
    Wang, Ke
    Wang, Lingyu
    Hung, Patrick C. K.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (06) : 552 - 575
  • [9] Iwuchukwu T., 2007, VLDB 07, P746
  • [10] LeFevre K., 2006, P IEEE ICDE, P25