To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

被引:2
作者
Ene, Stefan [1 ]
Nicolae, Bogdan [2 ]
Costan, Alexandru [3 ]
Antoniu, Gabriel [4 ]
机构
[1] Univ Politehn Bucuresti, Bucharest, Romania
[2] IBM Res, Dublin, Ireland
[3] INSA Rennes, IRISA, Rennes, France
[4] Inria Rennes, Rennes, France
来源
2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD) | 2014年
关键词
MapReduce; data management; incremental processing;
D O I
10.1109/DataCloud.2014.7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on cloud-based Big Data analytics has focused so far on optimizing the performance and cost-effectiveness of the computations, while largely neglecting an important aspect: users need to upload massive datasets on clouds for their computations. This paper studies the problem of running MapReduce applications when considering the simultaneous optimization of performance and cost of both the data upload and its corresponding computation taken together. We analyze the feasibility of incremental MapReduce approaches to advance the computation as much as possible during the data upload by using already transferred data to calculate intermediate results. Our key finding shows that overlapping the transfer time with as many incremental computations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity of the MapReduce cluster. Results show significant performance and cost reduction compared with state-of-the-art solutions that leverage incremental computations in a naive fashion.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2009, Microsoft Research
[2]  
[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
[3]  
[Anonymous], 2002, P 1 USENIX C FIL STO
[4]  
[Anonymous], 2010, P USENIX S OP SYST D
[5]   The Rise of RaaS: The Resourceas-a-Service Cloud [J].
Ben-Yehuda, Orna Agmon ;
Ben-Yehuda, Muli ;
Schuster, Assaf ;
Tsafrir, Dan .
COMMUNICATIONS OF THE ACM, 2014, 57 (07) :76-84
[6]   Grid'5000:: A large scale and highly reconfigurable experimental grid testbed [J].
Bolze, Raphael ;
Cappello, Franck ;
Caron, Eddy ;
Dayde, Michel ;
Desprez, Frederic ;
Jeannot, Emmanuel ;
Jegou, Yvon ;
Lanteri, Stephane ;
Leduc, Julien ;
Melab, Noredine ;
Mornet, Guillaume ;
Namyst, Raymond ;
Primet, Pascale ;
Quetier, Benjamin ;
Richard, Olivier ;
Talbi, El-Ghazali ;
Touche, Irea .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (04) :481-494
[7]  
Bu YY, 2010, PROC VLDB ENDOW, V3, P285
[8]  
Calder B, 2011, P 23 ACM S OP SYST P
[9]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[10]  
HAYES M, 2013, BIGDATA 13, P742