A Survey of Data-Intensive Scientific Workflow Management

被引:173
作者
Liu, Ji [1 ,2 ]
Pacitti, Esther [3 ,4 ]
Valduriez, Patrick [1 ,2 ]
Mattoso, Marta [5 ]
机构
[1] Inria, MSR Inria Joint Ctr, LIRMM, Montpellier, France
[2] Univ Montpellier, Montpellier, France
[3] Univ Montpellier, Inria, F-34059 Montpellier, France
[4] Univ Montpellier, LIRMM, F-34059 Montpellier, France
[5] Univ Fed Rio de Janeiro, COPPE, Rio De Janeiro, Brazil
关键词
Scientific workflow; Scientific workflow management system; Grid; Cloud; Multisite cloud; Distributed and parallel data management; Scheduling; Parallelization; CLOUD; PROVENANCE; FRAMEWORK; TAVERNA; EXECUTION; SCIENCE; TASKS;
D O I
10.1007/s10723-015-9329-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.
引用
收藏
页码:457 / 493
页数:37
相关论文
共 141 条
[11]  
[Anonymous], 2012, OpenStack cloud computing cookbook
[12]  
[Anonymous], 2004, WORKFL GRID SYST WOR
[13]  
[Anonymous], IEEE INT S PAR DISTR
[14]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[15]  
[Anonymous], 2012, 2012 IEEE 8 E SCI
[16]  
[Anonymous], 2013, EDBT ICDT WORKSH, DOI DOI 10.1145/2457317.2457365
[17]  
Balaskó A, 2014, SCIENCE GATEWAYS FOR DISTRIBUTED COMPUTING INFRASTRUCTURES: DEVELOPMENT FRAMEWORK AND EXPLOITATION BY SCIENTIFIC USER COMMUNITIES, P33, DOI 10.1007/978-3-319-11268-8_3
[18]  
Barker A, 2008, LECT NOTES COMPUT SC, V4967, P746
[19]  
Belhajjame K., 2011, PROV DATA MODEL ABST
[20]  
Bergmann R, 2011, LECT NOTES ARTIF INT, V6880, P17, DOI 10.1007/978-3-642-23291-6_4