Experimental evaluation of error-detection mechanisms

被引:12
作者
Constantinescu, C [1 ]
机构
[1] Intel Corp, Enterprise Architecture Lab JF1 231, Hillsboro, OR 97124 USA
关键词
coverage probability; error detection; fault injection; statistical inference;
D O I
10.1109/TR.2002.805785
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Effective, error-detection is paramount for building highly dependable computing systems. A new methodology, based on physical and simulated fault injection, has been developed for assessing the effectiveness of error-detection mechanisms. This approach has 2 steps: 1) Transient faults are physically injected at the IC pin level of a prototype, in order to derive the error-detection coverage. Experiments are carried out in a 3-dimensional space of events. Fault location, time of occurrence, and duration of the injected fault are the dimensions of this space. 2) Simulated fault-injection is performed to assess the effectiveness of new error-detection mechanisms, designed to improve the detection coverage. Complex circuitry, based on checking for protocol violations, is considered. A temporal model of the protocol checker is used, and transient faults are injected in signal traces captured from the prototype system. These traces are used as inputs of the simulation engine. s-Confidence intervals of the error-detection coverage are derived, both for the initial design and-the new detection mechanism. Physical fault-injection, carried out on a prototype server, proved that several signals were sensitive to transient faults and error-detection coverage was unacceptably low. Simulated fault injection shows that an error-detection mechanism, based on checking for protocol violations, can appreciably increase the detection coverage, especially for transient faults longer that 200 nanoseconds. Additional research is required for improving the error-detection of shorter transients. Fault injection experiments also show that error-detection coverage is a function of fault duration: the shorter the transient fault, the lower the coverage. As a consequence, injecting faults that have a unique, predefined duration, as it was frequently done in the past, does not provide accurate information on the effectiveness of the error-detection mechanisms. Injecting only permanent faults leads to unrealistically high estimates of the coverage. These experiments prove that combined physical and simulated fault injection, performed in a 3-dimensional space of events, is a superior approach, which allows the designers to accurately assess the efficacy of various candidate error-detection mechanisms without building expensive test circuits.
引用
收藏
页码:53 / 57
页数:5
相关论文
共 31 条
[1]  
[Anonymous], P INT S FAULT TOL CO
[2]  
[Anonymous], P INT COMP PERF DEP
[3]  
[Anonymous], IEEE COMPUTER
[4]  
[Anonymous], P EUR DEP COMP C
[5]   FAULT INJECTION FOR DEPENDABILITY VALIDATION - A METHODOLOGY AND SOME APPLICATIONS [J].
ARLAT, J ;
AGUERA, M ;
AMAT, L ;
CROUZET, Y ;
FABRE, JC ;
LAPRIE, JC ;
MARTINS, E ;
POWELL, D .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (02) :166-182
[6]   FAULT INJECTION EXPERIMENTS USING FIAT [J].
BARTON, JH ;
CZECK, EW ;
SEGALL, ZZ ;
SIEWIOREK, DP .
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (04) :575-582
[7]   Xception: A technique for the experimental evaluation of dependability in modern computers [J].
Carreira, J ;
Madeira, H ;
Silva, JG .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (02) :125-136
[8]  
CHILLAREGE R, 1989, P 19 INT S FAULT TOL, P356
[9]   Teraflops supercomputer: Architecture and validation of the fault tolerance mechanisms [J].
Constantinescu, C .
IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (09) :886-894
[10]   USING MULTISTAGE AND STRATIFIED SAMPLING FOR INFERRING FAULT-COVERAGE PROBABILITIES [J].
CONSTANTINESCU, C .
IEEE TRANSACTIONS ON RELIABILITY, 1995, 44 (04) :632-639