Towards Fault-tolerant HLA-based Distributed Simulations

被引:8
作者
Chen, Dan [1 ]
Turner, Stephen J. [2 ]
Cai, Wentong [2 ]
机构
[1] Yanshan Univ, Inst Elect Engn, Qinhuangdao 066004, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL | 2008年 / 84卷 / 10-11期
关键词
High Level Architecture; runtime infrastructure; fault tolerance; Decoupled Federate Architecture;
D O I
10.1177/0037549708095518
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large scale High Level Architecture (HLA)-based simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation federates running at different locations are subject to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault tolerance is required to support runtime robustness. This paper introduces a framework for robust HLA-based distributed simulations using a 'Decoupled Federate Architecture'. The framework provides a generic fault-tolerant model, which deals with failure with a dynamic substitution approach. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform effective fossil collection. The fault-tolerant model also avoids any unnecessary repeated computation when handling failure. Using a middleware approach, the framework supports reusability of legacy federate code and it is platform-neutral and independent of federate modeling approaches. Experiments have been carried out to validate and benchmark the fault-tolerant federates using an example of a supply-chain simulation. The experimental results show that the framework provides correct failure recovery. The results also indicate that the framework only incurs minimal overhead for facilitating fault tolerance and has a promising scalability.
引用
收藏
页码:493 / 509
页数:17
相关论文
共 22 条
[1]  
Berchtold C, 2001, MODELLING AND SIMULATION 2001, P616
[2]  
Birman KP., 1997, BUILDING SECURE RELI
[3]   Federate migration in HLA-based simulation [J].
Cai, WT ;
Yuan, ZJ ;
Low, MYH ;
Turner, SJ .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2005, 21 (01) :87-95
[4]  
Chen D., 2005, ACM Transactions on Modeling and Computer Simulation, V15, P316, DOI 10.1145/1113316.1113318
[5]  
Chen D, 2003, SIMULATION IN INDUSTRY, P131
[6]  
Chen D., 2006, P 20 ACM IEEE SCS WO, P183
[7]  
CHEN D, 2006, THESIS NANYANG TECHN
[8]   UNDERSTANDING FAULT-TOLERANT DISTRIBUTED SYSTEMS [J].
CRISTIAN, F .
COMMUNICATIONS OF THE ACM, 1991, 34 (02) :56-78
[9]   Standards for simulation: As simple as possible but not simpler - The High Level Architecture for simulation [J].
Dahmann, JS ;
Kuhl, F ;
Weatherly, R .
SIMULATION, 1998, 71 (06) :378-387
[10]  
*DMSO, 2002, RTI 1 3 NEXT GEN PRO