Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers

被引:22
作者
Zhang, Wanfeng [1 ,3 ]
Wang, Lizhe [1 ,2 ]
Ma, Yan [1 ]
Liu, Dingsheng [1 ]
机构
[1] Chinese Acad Sci, Inst Remote Sensing & Digital Earth, Beijing 100864, Peoples R China
[2] China Univ Geosci, Sch Comp, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-datacenter infrastructure; data intensive computing; task scheduling; big data computing; SHARING FILES; MANAGEMENT;
D O I
10.1002/spe.2229
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data intensive applications of remote sensing data processing are more and more widespread resulting from the evolutions in computer and network technologies. Especially, bags-of-tasks (BoTs) applications with a mass of sharing input files and directed acyclic graph (DAG) applications with data dependencies in a widely distributed computing environment bring new challenges. In this article, a strategy of partitioning group based on hypergraph (PGH) is introduced to formulate the model of sharing files. Within the PGH algorithm, BoTs applications would be partitioned into several groups to minimize the time of data transferring. We also adopted another scheduling policy, which is called optimized task tree (OTT) strategy to handle the DAG workflow of massive remote sensing data processing with data dependencies. A scheduling queue of DAG tasks would be updated according to the priorities changing. With the help of GridSim simulation environment, we designed the Gridlets within scheduler to test the performance of PGH and OTT. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:873 / 886
页数:14
相关论文
共 37 条
[1]  
Aho AlfredV., 2007, Compilers: principles, techniques, tools, V1009
[2]  
ALHUSAINI AH, 1999, 8 HET COMP WORKSH HC
[3]  
Ali S., 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556), P185, DOI 10.1109/HCW.2000.843743
[4]  
[Anonymous], TECHNICAL REPORT
[5]  
[Anonymous], P 7 IEEE INT C CLUST
[6]  
Balman M, 2008, P 15 ACM MARD GRAS C
[7]   GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing [J].
Buyya, R ;
Murshed, M .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2002, 14 (13-15) :1175-1220
[8]  
Casanova H., 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556), P349, DOI 10.1109/HCW.2000.843757
[9]   Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication [J].
Çatalyürek, ÜV ;
Aykanat, C .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (07) :673-693
[10]  
Chaves CG, 2010, GLOB TEL C GLOBECOM