Preventing useless checkpoints in distributed computations

被引：26

作者：

Helary, JM

Mostefaoui, A

Netzer, RHB

Raynal, M

机构：

来源：

SIXTEENTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS | 1997年

关键词：

D O I：

10.1109/RELDIS.1997.632814

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following important problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way the problem is to design a communication-induced checkpointing protocol that directs processes to take additional local (forced) checkpoints to ensure that no local checkpoint is useless. A general and efficient protocol answering this problem is proposed II is shown that several existing protocols that solve the same problem are particular instances of it. The design of this general protocol is motivated by the use of communication-induced checkpointing protocols in ''consistent global checkpoint''-based distributed applications. Detection of stable or unstable properties, rollback-recovery and determination of distributed breakpoints are examples of such applications.

引用

页码：183 / 190

页数：8