A worldwide flock of Condors: Load sharing among workstation clusters

被引:98
作者
Epema, DHJ
Livny, M
vanDantzig, R
Evers, X
Pruyne, J
机构
[1] UNIV WISCONSIN, DEPT COMP SCI, MADISON, WI 53706 USA
[2] NIKHEF H, NL-1009 DB AMSTERDAM, NETHERLANDS
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE | 1996年 / 12卷 / 01期
关键词
distributed processing; batch queueing system; wide-area load sharing; ownership rights; flocking;
D O I
10.1016/0167-739X(95)00035-Q
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Condor is a distributed batch system for sharing the workload of compute-intensive jobs in a pool of UNIX workstations connected by a network. In such a Condor pool, idle machines are spotted by Condor and allocated to queued jobs, thus putting otherwise unutilized capacity to efficient use. When institutions owning Condor pools cooperate, they may wish to exploit the joint capacity of their pools in a similar way. So the need arises to extend the Condor load-sharing and protection mechanisms beyond the boundaries of Condor pools, or in other words, to create a flock of Condors. Such a flock may include Condor pools connected by local-area networks as well as by wide-area networks. In this paper we describe the design and implementation of a distributed, layered Condor flocking mechanism. The main concept in this design is the Gateway Machine that represents in each pool idle machines from other pools in the flock and allows job transfers across pool boundaries. Our flocking design is transparent to the workstation owners, to the users, and to Condor itself. We also discuss our experiences with an intercontinental Condor flock.
引用
收藏
页码:53 / 65
页数:13
相关论文
共 17 条
  • [1] BRICKER A, 1992, 1069 U WISC MAD COMP
  • [2] *CODINE, 1994, COMP DISTR NETW ENV
  • [3] DUKE DW, 1994, RES HETEROGENEOUS NE
  • [4] EVERS X, 1993, THESIS DELFT U TECHN
  • [5] *IBM CORP, 1993, SH26722600 IBM CORP
  • [6] LITZKOW M, 1990, P IEEE WORKSH EXP DI
  • [7] Litzkow M. J., 1988, 8th International Conference on Distributed Computing Systems (Cat. No.88CH2541-1), P104, DOI 10.1109/DCS.1988.12507
  • [8] Litzkow M. J., 1987, Proceedings of the Summer 1987 USENIX Conference, P381
  • [9] LIVNY M, 1982, P ACM COMP NETW PERF
  • [10] Mutka M. W., 1987, 7th International Conference on Distributed Computing Systems (Cat. No.87CH2439-8), P2