Optimizing workflow data footprint

被引:23
作者
Singh, Gurmeet [1 ]
Vahi, Karan [1 ]
Ramakrishnan, Arun [1 ]
Mehta, Gaurang [1 ]
Deelman, Ewa [1 ]
Zhao, Henan [2 ]
Sakellariou, Rizos [2 ]
Blackburn, Kent [3 ]
Brown, Duncan [3 ]
Fairhurst, Stephen [3 ,4 ]
Meyers, David [3 ]
Berriman, G. Bruce [5 ]
Good, John [5 ]
Katz, Daniel S. [6 ]
机构
[1] USC Informat Sci Inst, Marina Del Rey, CA 90292 USA
[2] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[3] CALTECH, LIGO Lab, Pasadena, CA 91125 USA
[4] Univ Wisconsin, Dept Phys, Milwaukee, WI 53202 USA
[5] CALTECH, Ctr Infrared Proc & Anal, Pasadena, CA 91125 USA
[6] Louisiana State Univ, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
D O I
10.1155/2007/701609
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime.
引用
收藏
页码:249 / 268
页数:20
相关论文
共 43 条
  • [1] Search for gravitational waves from binary black hole inspirals in LIGO data
    Abbott, B
    Abbott, R
    Adhikari, R
    Ageev, A
    Agresti, J
    Ajith, P
    Allen, B
    Allen, J
    Amin, R
    Anderson, SB
    Anderson, WG
    Araya, M
    Armandula, H
    Ashley, M
    Asiri, F
    Aufmuth, P
    Aulbert, C
    Babak, S
    Balasubramanian, R
    Ballmer, S
    Barish, BC
    Barker, C
    Barker, D
    Barnes, M
    Barr, B
    Barton, MA
    Bayer, K
    Beausoleil, R
    Belczynski, K
    Bennett, R
    Berukoff, SJ
    Betzwieser, J
    Bhawal, B
    Bilenko, IA
    Billingsley, G
    Black, E
    Blackburn, K
    Blackburn, L
    Bland, B
    Bochner, B
    Bogue, L
    Bork, R
    Bose, S
    Brady, PR
    Braginsky, VB
    Brau, JE
    Brown, DA
    Bullington, A
    Bunkowski, A
    Buonanno, A
    [J]. PHYSICAL REVIEW D, 2006, 73 (06):
  • [2] Search for gravitational waves associated with the gamma ray burst GRB030329 using the LIGO detectors -: art. no. 042002
    Abbott, B
    Abbott, R
    Adhikari, R
    Ageev, A
    Allen, B
    Amin, R
    Anderson, SB
    Anderson, WG
    Araya, M
    Armandula, H
    Ashley, M
    Asiri, F
    Aufmuth, P
    Aulbert, C
    Babak, S
    Balasubramanian, R
    Ballmer, S
    Barish, BC
    Barker, C
    Barker, D
    Barnes, M
    Barr, B
    Barton, MA
    Bayer, K
    Beausoleil, R
    Belczynski, K
    Bennett, R
    Berukoff, SJ
    Betzwieser, J
    Bhawal, B
    Bilenko, IA
    Billingsley, G
    Black, E
    Blackburn, K
    Blackburn, L
    Bland, B
    Bochner, B
    Bogue, L
    Bork, R
    Bose, S
    Brady, PR
    Braginsky, VB
    Brau, JE
    Brown, DA
    Bullington, A
    Bunkowski, A
    Buonanno, A
    Burgess, R
    Busby, D
    Butler, WE
    [J]. PHYSICAL REVIEW D, 2005, 72 (04): : 1 - 17
  • [3] ABBOTT B, ARXIV07043368GRQC
  • [4] [Anonymous], OPEN SCI GRID
  • [5] [Anonymous], INT S HIGH PERF DIST
  • [6] LIGO and the detection of gravitational waves
    Barish, BC
    Weiss, R
    [J]. PHYSICS TODAY, 1999, 52 (10) : 44 - 50
  • [7] BARU C, 1998, P CASCON 98 C
  • [8] BENT J, 2002, 11 IEEE INT S HIGH P
  • [9] BERRIMAN B, 2003, ASTR DAT AN SOFTW SY, V13
  • [10] BERRIMAN GB, 2006, WORKFLOWS E SCI