Fault-tolerant wait-free shared objects

被引:56
作者
Jayanti, P [1 ]
Chandra, TD
Toueg, S
机构
[1] Dartmouth Coll, Sudikoff Lab Comp Sci 6211, Hanover, NH 03755 USA
[2] IBM, TJ Watson Res Ctr, Hawthorne, NY 10532 USA
[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
asynchronous computing; fault-tolerance; implementation; MIMD; shared memory; shared objects; synchronization;
D O I
10.1145/278298.278305
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects. We identify two classes of object failures: responsive and nonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also "hang" without responding. In each class, we define crash, omission, and arbitrary modes of failure. We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes F, object types T, and t greater than or equal to 0, we show how to implement a shared object of type T which is t-tolerant for F. Such an object remains correct and wait-free even if up to t base objects fail according to F. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result. Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show whether this property can be achieved, and, if so, how.
引用
收藏
页码:451 / 500
页数:50
相关论文
共 37 条
[1]   Computing with faulty shared objects [J].
Afek, Y ;
Greenberg, DS ;
Merrit, M ;
Taubenfeld, G .
JOURNAL OF THE ACM, 1995, 42 (06) :1231-1274
[2]  
Afek Y., 1992, Proceedings of the Eleventh Annual ACM Symposium on Principles of Distributed Computing, P47, DOI 10.1145/135419.135431
[3]  
[Anonymous], 1988, MITLCSTM373
[4]  
ASPNES J, 1990, PROCEEDINGS OF THE NINTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, P325, DOI 10.1145/93385.93433
[5]  
BERMAN P, 1989, P 30 ANN S FDN COMP, P410, DOI DOI 10.1109/SFCS.1989.63511
[6]  
BLOOM B, 1987, P 6 ACM S PRINC DIST, P249
[7]  
BURNS JE, 1987, P 6 ANN ACM S PRINC, P222
[8]   MODULAR CONSTRUCTION OF A BYZANTINE AGREEMENT PROTOCOL WITH OPTIMAL MESSAGE BIT COMPLEXITY [J].
COAN, BA ;
WELCH, JL .
INFORMATION AND COMPUTATION, 1992, 97 (01) :61-85
[9]  
COAN BA, 1987, THESIS MIT CAMBRIDGE
[10]   CONCURRENT CONTROL WITH READERS AND WRITERS [J].
COURTOIS, PJ ;
HEYMANS, F ;
PARNAS, DL .
COMMUNICATIONS OF THE ACM, 1971, 14 (10) :667-&