Improving Mapreduce for Incremental Processing Using Map Data Storage

被引:2
作者
Anandkrishna, R. [1 ]
Kumar, Dhananjay [1 ]
机构
[1] Anna Univ, Dept Informat Technol, MIT Campus, Chennai, Tamil Nadu, India
来源
FOURTH INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTER SCIENCE & ENGINEERING (ICRTCSE 2016) | 2016年 / 87卷
关键词
MapReduce; Incremental Processing; Hadoop; Distributed computing; Bloom Filter;
D O I
10.1016/j.procs.2016.05.163
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we propose methods for the improvement of performance of a MapReduce program when it is used for incremental processing. Incremental processing is generally used where data is refreshed periodically to reflect small changes to the input dataset. To reduce the delay in re-computing unchanged data, we introduce methods that selectively compute only data that has been altered. It incorporates the concept of Bloom Filter. Bloom filter is a space-efficient data structure, that can with a certain probability check if the data is modified or not. Traditional systems process the entire data when even a small percentage or none of data is changed. This is time-consuming as well as consumes a huge number of CPU clock cycles additionally to process data that has not been changed. In order to reduce the wastage of CPU clock cycles, a system is proposed wherein a method of execution using Bloom Filter helps improve the performance of the system up to 17 % when compared to existing system. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:288 / 293
页数:6
相关论文
共 11 条
  • [1] AGARWAL P, 2013, BIG DAT BIGDATA C 20, P118, DOI DOI 10.1109/BIGDATA.CONGRESS.2013.24
  • [2] Bhatotia P., 2011, Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11, P1, DOI [10.1145/2038916.2038923, DOI 10.1145/2038916.2038923]
  • [3] Bhushan M, 2015, 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), P1424
  • [4] Cairong Yan, 2012, 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), P534, DOI 10.1109/CLOUD.2012.67
  • [5] Mapreduce: Simplified data processing on large clusters
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
  • [6] To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload
    Ene, Stefan
    Nicolae, Bogdan
    Costan, Alexandru
    Antoniu, Gabriel
    [J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 9 - 16
  • [7] Fang H., 2008, INFOCOM 2008 27 C CO, P1
  • [8] False Negative Problem of Counting Bloom Filter
    Guo, Deke
    Liu, Yunhao
    Li, Xiangyang
    Yang, Panlong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (05) : 651 - 664
  • [9] Jun Zhao, 2012, 2012 IEEE International Conference on Information Science and Technology, P297, DOI 10.1109/ICIST.2012.6221655
  • [10] KHOPKAR SS, 2012, ADV SOC NETW AN MIN, P1144, DOI DOI 10.1109/ASONAM.2012.197