Impact of checkpoint latency on overhead ratio of a checkpointing scheme

被引:87
作者
Vaidya, NH
机构
[1] Department of Computer Science, 301 H.R. Bright Building, Texas A and M University, College Station
基金
美国国家科学基金会;
关键词
checkpointing and rollback; checkpoint latency; checkpoint overhead; performance analysis;
D O I
10.1109/12.609281
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Checkpointing reduces loss of computation in the presence of failures. Two metrics characterize a checkpointing scheme: checkpoint overhead and checkpoint latency. This paper shows that a large increase in latency is acceptable if it is accompanied by a relatively small reduction in overhead. Also, for equidistant checkpoints, optimal checkpoint interval is shown to be typically independent of checkpoint latency.
引用
收藏
页码:942 / 947
页数:6
相关论文
共 18 条
[1]
Chandy K. M., 1975, IEEE Transactions on Software Engineering, VSE-1, P100, DOI 10.1109/TSE.1975.6312824
[2]
OPTIMAL STRATEGIES FOR SCHEDULING CHECKPOINTS AND PREVENTIVE MAINTENANCE [J].
COFFMAN, EG ;
GILBERT, EN .
IEEE TRANSACTIONS ON RELIABILITY, 1990, 39 (01) :9-18
[3]
THE EFFECTS OF CHECKPOINTING ON PROGRAM EXECUTION TIME [J].
DUDA, A .
INFORMATION PROCESSING LETTERS, 1983, 16 (05) :221-229
[4]
SELECTION OF A CHECKPOINT INTERVAL IN A CRITICAL-TASK ENVIRONMENT [J].
GEIST, R ;
REYNOLDS, R ;
WESTALL, J .
IEEE TRANSACTIONS ON RELIABILITY, 1988, 37 (04) :395-400
[5]
OPTIMUM CHECKPOINT INTERVAL [J].
GELENBE, E .
JOURNAL OF THE ACM, 1979, 26 (02) :259-270
[6]
KULKARNI VG, 1990, COMM STAT STOCHASTIC, V4, P615
[7]
ON THE EXECUTION OF LARGE BATCH PROGRAMS IN UNRELIABLE COMPUTING SYSTEMS [J].
LEUNG, CHC ;
CHOO, QH .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1984, 10 (04) :444-450
[8]
LOW-LATENCY, CONCURRENT CHECKPOINTING FOR PARALLEL PROGRAMS [J].
LI, K ;
NAUGHTON, JF ;
PLANK, JS .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (08) :874-879
[9]
LOHMAN GM, 1977, ACM T DATABASE SYST, V2, P202
[10]
PLANK JS, 1995, P US WINT 1995 TECHN