A simple and fast asynchronous consensus protocol based on a weak failure detector

被引:40
作者
Hurfin, M [1 ]
Raynal, M [1 ]
机构
[1] IRISA, F-35042 Rennes, France
关键词
asynchronous distributed systems; consensus problem; crash failures; fault-tolerance; unreliable failure detectors;
D O I
10.1007/s004460050067
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Consensus problem is a fundamental paradigm for fault-tolerant asynchronous systems. It abstracts a family of problems known as Agreement (or Coordination) problems. Any solution to consensus can serve as a basic building block for solving such problems (e.g., atomic commitment or atomic broadcast). Solving consensus in an asynchronous system is not a trivial task: it has been proven (1985) by Fischer, Lynch and Paterson that there is no deterministic solution in asynchronous systems which are subject to even a single crash failure. To circumvent this impossibility result, Chandra and Toueg have introduced the concept of unreliable failure detectors (1991), and have studied how these failure detectors can be used to solve consensus in asynchronous systems with crash failures. This paper presents a new consensus protocol that uses a failure detector of the class lozenge S. Like previous protocols, it is based on the rotating coordinator paradigm and proceeds in asynchronous rounds. Simplicity and efficiency are the main characteristics of this protocol. From a performance point of view, the protocol is particularly efficient when, whether failures occur or not, the underlying failure detector makes no mistake (a common case in practice). From a design point of view, the protocol is based on the combination of three simple mechanisms: a voting mechanism, a small finite state automaton which manages the behavior of each process, and the possibility for a process to change its mind during a round.
引用
收藏
页码:209 / 223
页数:15
相关论文
共 15 条
[1]  
AGUILERA MK, 1996, P 10 INT WORKSH DIST, P29
[2]   RELIABLE COMMUNICATION IN THE PRESENCE OF FAILURES [J].
BIRMAN, KP ;
JOSEPH, TA .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1987, 5 (01) :47-76
[3]  
Birman KP, 1996, BUILDING SECURE RELI
[4]  
CHANDRA T, 1996, J ACM, V34, P225
[5]   The weakest failure detector for solving Consensus [J].
Chandra, TD ;
Hadzilacos, V ;
Toueg, S .
JOURNAL OF THE ACM, 1996, 43 (04) :685-722
[6]   EARLY STOPPING IN BYZANTINE AGREEMENT [J].
DOLEV, D ;
REISCHUK, R ;
STRONG, HR .
JOURNAL OF THE ACM, 1990, 37 (04) :720-741
[7]   ON THE MINIMAL SYNCHRONISM NEEDED FOR DISTRIBUTED CONSENSUS [J].
DOLEV, D ;
DWORK, C ;
STOCKMEYER, L .
JOURNAL OF THE ACM, 1987, 34 (01) :77-97
[8]   CONSENSUS IN THE PRESENCE OF PARTIAL SYNCHRONY [J].
DWORK, C ;
LYNCH, N ;
STOCKMEYER, L .
JOURNAL OF THE ACM, 1988, 35 (02) :288-323
[9]   IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROCESS [J].
FISCHER, MJ ;
LYNCH, NA ;
PATERSON, MS .
JOURNAL OF THE ACM, 1985, 32 (02) :374-382
[10]  
Gray Jim, 1978, LECT NOTES COMPUTER, V60, P393