MapReduce与Spark用于大数据分析之比较

被引:78
作者
吴信东 [1 ,2 ]
嵇圣硙 [1 ]
机构
[1] 合肥工业大学计算机与信息学院
[2] School of Computing and Informatics,University of Louisiana at Lafayette
基金
国家重点研发计划;
关键词
大数据; MapReduce; Spark; 迭代问题; 非迭代问题;
D O I
10.13328/j.cnki.jos.005557
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
评述了MapReduce与Spark两种大数据计算算法和架构,从背景、原理以及应用场景进行分析和比较,并对两种算法各自优点以及相应的限制做出了总结.当处理非迭代问题时,MapReduce凭借其自身的任务调度策略和shuffle机制,在中间数据传输数量以及文件数目方面的性能要优于Spark;而在处理迭代问题和一些低延迟问题时,Spark可以根据数据之间的依赖关系对任务进行更合理的划分,相较于MapReduce,有效地减少了中间数据传输数量与同步次数,提高了系统的运行效率.
引用
收藏
页码:1770 / 1791
页数:22
相关论文
共 10 条
[1]   MapReduce大数据处理平台与算法研究进展 [J].
宋杰 ;
孙宗哲 ;
毛克明 ;
鲍玉斌 ;
于戈 .
软件学报, 2017, 28 (03) :514-543
[2]  
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark[J] . Minseo Kang,Jae-Gil Lee.Cluster Computing . 2017 (4)
[3]  
Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark[J] . Ilias Mavridis,Helen Karatza.The Journal of Systems & Software . 2017
[4]  
What is Big Data?[J] . Keith Gordon.ITNow . 2013 (3)
[5]  
On the optimization of schedules for MapReduce workloads in the presence of shared scans[J] . Joel Wolf,Andrey Balmin,Deepak Rajan,Kirsten Hildrum,Rohit Khandekar,Sujay Parekh,Kun-Lung Wu,Rares Vernica.The VLDB Journal . 2012 (5)
[6]  
iMapReduce: A Distributed Computing Framework for Iterative Computation[J] . Yanfeng Zhang,Qixin Gao,Lixin Gao,Cuirong Wang.Journal of Grid Computing . 2012 (1)
[7]  
Adapting scientific computing problems to clouds using MapReduce[J] . Satish Narayana Srirama,Pelle Jakovits,Eero Vainikko.Future Generation Computer Systems . 2011 (1)
[8]  
Speculative execution in a distributed file system[J] . Edmund B. Nightingale,Peter M. Chen,Jason Flinn.ACM SIGOPS Operating Systems Review . 2005 (5)
[9]  
Dryad:Distributed Data-parallel Programs from Sequential Building Blocks .2 M. Isard,M. Budiu,Y. Yu,A. Birrell,D. Fetterly. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems . 2007
[10]  
Twister:A runtime for iterative Map Reduce .2 Ekanayake J,Li H,Zhang B,Gunarathne T,Bae S H,Qiu J,Fox G. Proc.of the19th ACM Int’’l Symp.on High Performance Distributed Computing . 2010