High-performance remote access to climate simulation data: a challenge problem for data grid technologies

被引:25
作者
Chervenak, A
Deelman, E
Kesselman, C
Allcock, B
Foster, I
Nefedova, V
Lee, J
Sim, A
Shoshahi, A
Drach, B
Williams, D
Middleton, D
机构
[1] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
[2] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[4] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
[5] Natl Ctr Atmospher Res, Boulder, CO 80305 USA
关键词
grid computing; Globus Toolkit (R); GridFTP; Earth System Grid;
D O I
10.1016/j.parco.2003.06.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In numerous scientific disciplines, terabyte and petabyte-scale data collections are emerging as critical community resources. A new class of "data grid" infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the climate modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid-I prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user's desktop. We report on experiments conducted over SciNET at SC'2000, where we achieved peak performance of 1.55 Gb/s and sustained performance of 512.9 Mb/s for data transfers between Texas and California. Finally, we describe the development of the next-generation Earth System Grid-II (ESG-II) project. Important issues for ESG-II include security requirements for production environments, efficient data filtering and transport, metadata services for discovery of relevant climate datasets, and sophisticated request or workflow management for complex tasks. (C) 2003 Published by Elsevier B.V.
引用
收藏
页码:1335 / 1356
页数:22
相关论文
共 27 条
  • [1] Data management and transfer in high-performance computational grid environments
    Allcock, B
    Bester, J
    Bresnahan, J
    Chervenak, AL
    Foster, I
    Kesselman, C
    Meder, S
    Nefedova, V
    Quesnel, D
    Tuecke, S
    [J]. PARALLEL COMPUTING, 2002, 28 (05) : 749 - 771
  • [2] ALLCOCK W, 2001, SC 2001
  • [3] ATKINSON M, 2003, INTRO OGSA DAI
  • [4] BARU C, 1998, 8 ANN IBM CTR ADV ST
  • [5] Distributed processing of very large datasets with DataCutter
    Beynon, MD
    Kurc, T
    Catalyurek, U
    Chang, CL
    Sussman, A
    Saltz, J
    [J]. PARALLEL COMPUTING, 2001, 27 (11) : 1457 - 1478
  • [6] The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets
    Chervenak, A
    Foster, I
    Kesselman, C
    Salisbury, C
    Tuecke, S
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2000, 23 (03) : 187 - 200
  • [7] CHERVENAK A, 2003, METADATA CATALOG SER
  • [8] CHERVENAK A, 2002, SC 02 HIGH PERF NETW
  • [9] CORNILLON P, 2002, NVODS OPENDAP
  • [10] Czajkowski K, 2001, 10TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, P181, DOI 10.1109/HPDC.2001.945188