Improving Mapreduce for Incremental Processing Using Map Data Storage

被引：2

作者：

Anandkrishna, R. ^{[1
]}

Kumar, Dhananjay ^{[1
]}

机构：

[1] Anna Univ, Dept Informat Technol, MIT Campus, Chennai, Tamil Nadu, India

来源：

FOURTH INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTER SCIENCE & ENGINEERING (ICRTCSE 2016) | 2016年 / 87卷

关键词：

MapReduce; Incremental Processing; Hadoop; Distributed computing; Bloom Filter;

D O I：

10.1016/j.procs.2016.05.163

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we propose methods for the improvement of performance of a MapReduce program when it is used for incremental processing. Incremental processing is generally used where data is refreshed periodically to reflect small changes to the input dataset. To reduce the delay in re-computing unchanged data, we introduce methods that selectively compute only data that has been altered. It incorporates the concept of Bloom Filter. Bloom filter is a space-efficient data structure, that can with a certain probability check if the data is modified or not. Traditional systems process the entire data when even a small percentage or none of data is changed. This is time-consuming as well as consumes a huge number of CPU clock cycles additionally to process data that has not been changed. In order to reduce the wastage of CPU clock cycles, a system is proposed wherein a method of execution using Bloom Filter helps improve the performance of the system up to 17 % when compared to existing system. (C) 2016 The Authors. Published by Elsevier B.V.

引用

页码：288 / 293

页数：6

共 11 条

[1] AGARWAL P, 2013, BIG DAT BIGDATA C 20, P118, DOI DOI 10.1109/BIGDATA.CONGRESS.2013.24
[2] Bhatotia P., 2011, Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11, P1, DOI [10.1145/2038916.2038923, DOI 10.1145/2038916.2038923]
[3] Bhushan M, 2015, 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), P1424
[4] Cairong Yan, 2012, 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), P534, DOI 10.1109/CLOUD.2012.67
[5] Mapreduce: Simplified data processing on large clusters
Dean, Jeffrey
Ghemawat, Sanjay
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
[6] To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload
Ene, Stefan
Nicolae, Bogdan
Costan, Alexandru
Antoniu, Gabriel
[J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 9 - 16
[7] Fang H., 2008, INFOCOM 2008 27 C CO, P1
[8] False Negative Problem of Counting Bloom Filter
Guo, Deke
Liu, Yunhao
Li, Xiangyang
Yang, Panlong
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (05) : 651 - 664
[9] Jun Zhao, 2012, 2012 IEEE International Conference on Information Science and Technology, P297, DOI 10.1109/ICIST.2012.6221655
[10] KHOPKAR SS, 2012, ADV SOC NETW AN MIN, P1144, DOI DOI 10.1109/ASONAM.2012.197

← 1 2 →