Message logging: Pessimistic, optimistic, causal, and optimal

被引:117
作者
Alvisi, L [1 ]
Marzullo, K
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
message logging; optimistic protocols; pessimistic protocols; checkpoint-restart protocols; resilient processes; specification of fault-tolerance techniques;
D O I
10.1109/32.666828
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Message-logging protocols are an integral part of a popular technique for implementing processes that can recover from crash failures. All message-logging protocols require that, when recovery is complete, there be no orphan processes, which are surviving processes whose states are inconsistent with the recovered state of a crashed process. We give a precise specification of the consistency property "no orphan processes." From this specification, we describe how different existing classes of message-logging protocols (namely optimistic, pessimistic, and a class that we call causal) implement this property. We then propose a set of metrics to evaluate the performance of message-logging protocols, and characterize the protocols that are optimal with respect to these metrics. Finally, starting from a protocol that relies on causal delivery order, we show how to derive optimal causal protocols that tolerate f overlapping failures and recoveries for a parameter f: 1 less than or equal to f less than or equal to n.
引用
收藏
页码:149 / 159
页数:11
相关论文
共 22 条
  • [1] Alvisi L., 1996, Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, P58, DOI 10.1145/248052.248061
  • [2] ALVISI L, 1993, P 23 FAULT TOL COMP, P145
  • [3] ALVISI L, 1996, THESIS CORNELL U DEP
  • [4] BIRMAN K, 1991, ACM T COMPUT SYST, V9, P272, DOI 10.1145/128738.128742
  • [5] RELIABLE COMMUNICATION IN THE PRESENCE OF FAILURES
    BIRMAN, KP
    JOSEPH, TA
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1987, 5 (01): : 47 - 76
  • [6] BORG A, 1983, P 9 ACM S OP SYST PR, P90
  • [7] MANETHO - TRANSPARENT ROLLBACK-RECOVERY WITH LOW OVERHEAD, LIMITED ROLLBACK, AND FAST OUTPUT COMMIT
    ELNOZAHY, EN
    ZWAENEPOEL, W
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1992, 41 (05) : 526 - 531
  • [8] ELNOZAHY EN, 1994, 24 INT S FAULT TOL C, P298
  • [9] GRAY JN, 1977, LECT NOTES COMPUTER, V60, P393
  • [10] Johnson D. B., 1987, 17 ANN INT S FAULT T, P14