Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers

被引:22
作者
Zhang, Wanfeng [1 ,3 ]
Wang, Lizhe [1 ,2 ]
Ma, Yan [1 ]
Liu, Dingsheng [1 ]
机构
[1] Chinese Acad Sci, Inst Remote Sensing & Digital Earth, Beijing 100864, Peoples R China
[2] China Univ Geosci, Sch Comp, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-datacenter infrastructure; data intensive computing; task scheduling; big data computing; SHARING FILES; MANAGEMENT;
D O I
10.1002/spe.2229
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data intensive applications of remote sensing data processing are more and more widespread resulting from the evolutions in computer and network technologies. Especially, bags-of-tasks (BoTs) applications with a mass of sharing input files and directed acyclic graph (DAG) applications with data dependencies in a widely distributed computing environment bring new challenges. In this article, a strategy of partitioning group based on hypergraph (PGH) is introduced to formulate the model of sharing files. Within the PGH algorithm, BoTs applications would be partitioned into several groups to minimize the time of data transferring. We also adopted another scheduling policy, which is called optimized task tree (OTT) strategy to handle the DAG workflow of massive remote sensing data processing with data dependencies. A scheduling queue of DAG tasks would be updated according to the priorities changing. With the help of GridSim simulation environment, we designed the Gridlets within scheduler to test the performance of PGH and OTT. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:873 / 886
页数:14
相关论文
共 37 条
[31]  
Sulistio A, 2005, 6 INT C INT COMP ICO
[32]   Performance-effective and low-complexity task scheduling for heterogeneous computing [J].
Topcuoglu, H ;
Hariri, S ;
Wu, MY .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2002, 13 (03) :260-274
[33]   G-Hadoop: MapReduce across distributed data centers for data-intensive computing [J].
Wang, Lizhe ;
Tao, Jie ;
Ranjan, Rajiv ;
Marten, Holger ;
Streit, Achim ;
Chen, Jingying ;
Chen, Dan .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (03) :739-750
[34]   Resource management of distributed Virtual Machines [J].
Wang, Lizhe ;
Chen, Dan ;
Zhao, Jiaqi ;
Tao, Jie .
INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2012, 10 (02) :96-111
[35]   Towards building a cloud for scientific applications [J].
Wang, Lizhe ;
Kunze, Marcel ;
Tao, Jie ;
von Laszewski, Gregor .
ADVANCES IN ENGINEERING SOFTWARE, 2011, 42 (09) :714-722
[36]   Virtual workflow system for distributed collaborative scientific applications on Grids [J].
Wang, Lizhe ;
Chen, Dan ;
Huang, Fang .
COMPUTERS & ELECTRICAL ENGINEERING, 2011, 37 (03) :300-310
[37]   Towards building a multi-datacenter infrastructure for massive remote sensing image processing [J].
Zhang, Wanfeng ;
Wang, Lizhe ;
Liu, Dingsheng ;
Song, Weijing ;
Ma, Yan ;
Liu, Peng ;
Chen, Dan .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2013, 25 (12) :1798-1812