Parallel algorithms with processor failures and delays

被引：32

作者：

Buss, JF

Kanellakis, PC

Ragde, PL

Shvartsman, AA

机构：

[1] BROWN UNIV, DEPT COMP SCI, PROVIDENCE, RI 02912 USA

[2] MIT, COMP SCI LAB, CAMBRIDGE, MA 02139 USA

来源：

JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC | 1996年 / 20卷 / 01期

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

D O I：

10.1006/jagm.1996.0003

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processes are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared memory; the complexity measures are completed work (where processors are charged for completed fixed-size update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serialization of the actions of the processors determined by an on-line adversary; the complexity measure is total work (number of steps taken by all processors). Despite their differences, the two models share key algorithmic techniques. We present new algorithms for the Write-All problem (in which P processors write ones into an array of size N) for the two models. These algorithms can be used to implement a simulation strategy for any N processor PRAM on a restartable fail-stop P processor CRCW PRAM such that it guarantees a terminating execution of each simulated N processor step, with O(log(2) N) overhead ratio, and O(min{N + P log(2) N + M log N, N . p(0.59)}) (subquadratic) completed work (where M is the number of failures during this step's simulation). This strategy has a range of optimality. We also show that the Write-All requires N + Omega(P log P) completed/total work on these models for P less than or equal to N. (C) 1996 Academic Press, Inc.

引用

页码：45 / 86

页数：42

共 49 条

[1] A SURVEY AND COMPARISON OF FAULT-TOLERANT MULTISTAGE INTERCONNECTION NETWORKS [J].

ADAMS, GB ;

AGRAWAL, DP ;

SIEGEL, HJ .

COMPUTER, 1987, 20 (06) :14-27

[2]

Afek Y., 1987, 28th Annual Symposium on Foundations of Computer Science (Cat. No.87CH2471-1), P347, DOI 10.1109/SFCS.1987.38

[3]

ANDERSON RJ, 1991, 23RD P ACM S THEOR C, P370

[4]

ANDERSON RJ, 1990, 2ND P ANN ACM S PAR, P95

[5]

ASSAF S, 1990, ANN IEEE SYMP FOUND, P275

[6]

BUSS J, 1990, CERTIFIED WRITE ALL

[7]

CHOR B, 1987, ACM PODC, P86

[8]

COLE R, 1990, 2ND P ACM S PAR ALG, P85

[9]

COLE R, 1989, 1989 P ACM S PAR ALG, P170

[10] UNDERSTANDING FAULT-TOLERANT DISTRIBUTED SYSTEMS [J].

CRISTIAN, F .

COMMUNICATIONS OF THE ACM, 1991, 34 (02) :56-78

← 1 2 3 4 5 →