On quiescent reliable communication

被引:33
作者
Aguilera, MK [1 ]
Chen, W [1 ]
Toueg, S [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
algorithms; reliability; reliable communication; quiescence; asynchronous systems; heartbeat; crash failures; link failures; failure detection; fault-tolerance; message passing; processor failures;
D O I
10.1137/S0097539798341296
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the problem of achieving reliable communication with quiescent algorithms (i.e., algorithms that eventually stop sending messages) in asynchronous systems with process crashes and lossy links. We first show that it is impossible to solve this problem in asynchronous systems (with no failure detectors). We then show that, among failure detectors that output lists of suspects, the weakest one that can be used to solve this problem is lozenge P, a failure detector that cannot be implemented. To overcome this difficulty, we introduce an implementable failure detector called Heartbeat and show that it can be used to achieve quiescent reliable communication. Heartbeat is novel: in contrast to typical failure detectors, it does not output lists of suspects and it is implementable without timeouts. With Heartbeat, many existing algorithms that tolerate only process crashes can be transformed into quiescent algorithms that tolerate both process crashes and message losses. This can be applied to consensus, atomic broadcast, k-set agreement, atomic commitment, etc.
引用
收藏
页码:2040 / 2073
页数:34
相关论文
共 35 条
[1]   Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks [J].
Aguilera, MK ;
Chen, W ;
Toueg, S .
THEORETICAL COMPUTER SCIENCE, 1999, 220 (01) :3-30
[2]  
Aguilera MK, 1999, SIAM J COMPUT, V28, P890, DOI 10.1137/S0097539796312915
[3]  
BABAOGLU O, 1993, DISTRIBUTED SYSTEMS, pCH6
[4]  
BABAOGLU O, 1997, UBLCS971 U BOL DEP C
[5]  
BASU A, 1996, LNCS, V1151, P105
[6]  
BAZZI R, 1992, LECT NOTES COMPUT SC, V647, P166
[7]  
Ben-Or Michael, 1983, Proceedings of the second ACM Symposium on Principles of Distributed Computing (PODC), P27
[8]  
Biran O., 1988, Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, P263, DOI 10.1145/62546.62590
[9]   ASYNCHRONOUS CONSENSUS AND BROADCAST PROTOCOLS [J].
BRACHA, G ;
TOUEG, S .
JOURNAL OF THE ACM, 1985, 32 (04) :824-840
[10]  
CHANDRA T, 1997, COMMUNICATION