Mapreduce: Simplified data processing on large clusters

被引:2283
作者
Dean, Jeffrey
Ghemawat, Sanjay
机构
[1] Google, Mountain View, CA
关键词
D O I
10.1145/1327452.1327492
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.
引用
收藏
页码:107 / 113
页数:7
相关论文
共 15 条
  • [1] [Anonymous], P NEUR INF PROC SYST
  • [2] [Anonymous], SORT BENCHMARK HOME
  • [3] Arpaci-Dusseau Andrea C., 1997, P 1997 ACM SIGMOD IN
  • [4] Web search for a planet:: The Google cluster architecture
    Barroso, LA
    Dean, J
    Hölzle, U
    [J]. IEEE MICRO, 2003, 23 (02) : 22 - 28
  • [5] Bent John, 2004, P 1 USENIX S NETW SY
  • [6] BLELOCH GE, 1989, IEEE T COMPUT, V38, P11
  • [7] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [8] FOX A, 1997, P 16 ACM S OP SYST P, P78
  • [9] Ghemawat S., 2003, Operating Systems Review, V37, P29, DOI 10.1145/1165389.945450
  • [10] GORLATCH S, 1996, LECT NOTES COMPUTER, V1124, P401