MapReduce与Spark用于大数据分析之比较

被引：78

作者：

吴信东 ^{[1
,2
]}

嵇圣硙 ^{[1
]}

机构：

[1] 合肥工业大学计算机与信息学院

[2] School of Computing and Informatics,University of Louisiana at Lafayette

来源：

软件学报 | 2018年 / 29卷 / 06期

基金：

国家重点研发计划;

关键词：

大数据; MapReduce; Spark; 迭代问题; 非迭代问题;

D O I：

10.13328/j.cnki.jos.005557

中图分类号：

TP311.13 [];

学科分类号：

1201 ;

摘要：

评述了MapReduce与Spark两种大数据计算算法和架构,从背景、原理以及应用场景进行分析和比较,并对两种算法各自优点以及相应的限制做出了总结.当处理非迭代问题时,MapReduce凭借其自身的任务调度策略和shuffle机制,在中间数据传输数量以及文件数目方面的性能要优于Spark;而在处理迭代问题和一些低延迟问题时,Spark可以根据数据之间的依赖关系对任务进行更合理的划分,相较于MapReduce,有效地减少了中间数据传输数量与同步次数,提高了系统的运行效率.

引用

页码：1770 / 1791

页数：22

共 10 条

[1] MapReduce大数据处理平台与算法研究进展 [J].

宋杰 ;

孙宗哲 ;

毛克明 ;

鲍玉斌 ;

于戈 .

软件学报, 2017, 28 (03) :514-543

[2]

An experimental analysis of limitations of MapReduce for iterative algorithms on Spark[J] . Minseo Kang,Jae-Gil Lee.Cluster Computing . 2017 (4)

[3]

Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark[J] . Ilias Mavridis,Helen Karatza.The Journal of Systems & Software . 2017

[4]

What is Big Data?[J] . Keith Gordon.ITNow . 2013 (3)

[5]

On the optimization of schedules for MapReduce workloads in the presence of shared scans[J] . Joel Wolf,Andrey Balmin,Deepak Rajan,Kirsten Hildrum,Rohit Khandekar,Sujay Parekh,Kun-Lung Wu,Rares Vernica.The VLDB Journal . 2012 (5)

[6]

iMapReduce: A Distributed Computing Framework for Iterative Computation[J] . Yanfeng Zhang,Qixin Gao,Lixin Gao,Cuirong Wang.Journal of Grid Computing . 2012 (1)

[7]

Adapting scientific computing problems to clouds using MapReduce[J] . Satish Narayana Srirama,Pelle Jakovits,Eero Vainikko.Future Generation Computer Systems . 2011 (1)

[8]

Speculative execution in a distributed file system[J] . Edmund B. Nightingale,Peter M. Chen,Jason Flinn.ACM SIGOPS Operating Systems Review . 2005 (5)

[9]

Dryad:Distributed Data-parallel Programs from Sequential Building Blocks .2 M. Isard,M. Budiu,Y. Yu,A. Birrell,D. Fetterly. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems . 2007

[10]

Twister:A runtime for iterative Map Reduce .2 Ekanayake J,Li H,Zhang B,Gunarathne T,Bae S H,Qiu J,Fox G. Proc.of the19th ACM Int’’l Symp.on High Performance Distributed Computing . 2010

← 1 →