A Survey of Data-Intensive Scientific Workflow Management

被引:173
作者
Liu, Ji [1 ,2 ]
Pacitti, Esther [3 ,4 ]
Valduriez, Patrick [1 ,2 ]
Mattoso, Marta [5 ]
机构
[1] Inria, MSR Inria Joint Ctr, LIRMM, Montpellier, France
[2] Univ Montpellier, Montpellier, France
[3] Univ Montpellier, Inria, F-34059 Montpellier, France
[4] Univ Montpellier, LIRMM, F-34059 Montpellier, France
[5] Univ Fed Rio de Janeiro, COPPE, Rio De Janeiro, Brazil
关键词
Scientific workflow; Scientific workflow management system; Grid; Cloud; Multisite cloud; Distributed and parallel data management; Scheduling; Parallelization; CLOUD; PROVENANCE; FRAMEWORK; TAVERNA; EXECUTION; SCIENCE; TASKS;
D O I
10.1007/s10723-015-9329-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.
引用
收藏
页码:457 / 493
页数:37
相关论文
共 141 条
[1]   Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support [J].
Abouelhoda, Mohamed ;
Issa, Shadi Alaa ;
Ghanem, Moustafa .
BMC BIOINFORMATICS, 2012, 13
[2]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[3]  
Albrecht M., 2012, P 1 ACM SIGMOD WORKS, P1, DOI DOI 10.1145/2443416.2443417
[4]  
Altintas I, 2006, LECT NOTES COMPUT SC, V4145, P118
[5]  
[Anonymous], 2012 IEEE 8 INT C E
[6]  
[Anonymous], 2011, P 7 INT C NETW SERV
[7]  
[Anonymous], 2007, WORKFLOWS E SCI, DOI DOI 10.1007/978-1-84628-757-2_19
[8]  
[Anonymous], 2015, PEGASUS 4 4 1 USER G
[9]  
[Anonymous], 2008, IGARSS 2008 2008 IEE, DOI DOI 10.1109/IGARSS.2008.4779913
[10]  
[Anonymous], 2004, P INT C SCI STAT DAT