Benefits and drawbacks of redundant batch requests

被引：21

作者：

Casanova H. ^{[1
]}

机构：

[1] Department of Information and Computer Sciences, University of Hawai'i at Manoa, Honolulu, HI 96822

来源：

Journal of Grid Computing | 2007年 / 5卷 / 2期

基金：

美国国家科学基金会;

关键词：

Batch scheduling; Job scheduling; Redundant requests;

D O I：

10.1007/s10723-007-9068-6

中图分类号：

学科分类号：

摘要：

Most parallel computing platforms are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes is granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled platforms the one that will produce the shortest turnaround time. As a result, an increasing number of users resort to "redundant requests": several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we evaluate the impact of redundant requests on (1) average job performance, (2) schedule fairness, (3) system load, and (4) system predictability. We find that some of the popularly held beliefs about the harmfulness of redundant batch requests are unfounded. We also find that the two most critical issues with redundant requests are the additional load on current middleware infrastructures and unfairness towards users who do not use redundant requests. Using our experimental results we quantify both impacts in terms of the number of users who use redundant requests and of the amount of request redundancy these users employ. © Springer Science + Business Media B.V. 2007.

引用

页码：235 / 250

页数：15

共 24 条

[1]

Brevik J., Nurmi D., Wolski R., Predicting bounds on queuing delay for batch-scheduled parallel machines, Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), pp. 110-118, (2006)

[2]

Bucur A., Epema D., The performance of processor co-allocation in multicluster systems, Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 302-309, (2003)

[3]

Capit N., Da Costa G., Georgiou Y., Huard G., Martin C., Mounie G., Neyron P., Richard O., A batch scheduler with high level components, Proc. of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), pp. 776-783, (2005)

[4]

Feitelson D., Parallel Workloads Archive, (2006)

[5]

Feitelson D.G., Rudolph L., Schwiegelshohn U., Parallel job scheduling - A status report, Proc. of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 3277, pp. 1-16, (2004)

[6]

Gudgin M., Hadley M., Mendelsohn U., Moreau J.-J., Canon S., Nielsen H., Simple Object Access Prototol 1.1, (2003)

[7]

Hamscher V., Schwiegelshohn U., Streit A., Yahyapour R., Evaluation of job-scheduling strategies for Grid computing, Proc. of the 1st IEEE/ACM International Workshop on Grid Computing, 1971, pp. 191-202, (2000)

[8]

Head M.R., Govindaraju M., Slominski A., Liu P., Abu-Ghazaleh N., van Engelen R., Chiu K., Lewis M.J., A benchmark suite for SOAP-based communication in Grid web services, Proc. of the 2005 ACM/IEEE Conference on Supercomputing (SC), pp. 19-31, (2005)

[9]

Legrand A., Marchal L., Casanova H., Scheduling distributed applications: The S im G rid simulation framework, Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 138-145, (2003)

[10]

Lifka D., The ANL/IBM SP scheduling system, Proc. of the 1st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, 949, pp. 295-303, (1995)

← 1 2 3 →