SCALABILITY ANALYSIS IN GRACEFULLY-DEGRADABLE LARGE SYSTEMS

被引:4
作者
NAJJAR, WA
GAUDIOT, JL
机构
[1] UNIV SO CALIF, DEPT ELECT ENGN SYST, LOS ANGELES, CA 90089 USA
[2] UNIV SO CALIF, INST INFORMAT SCI, MARINA DEL REY, CA 90291 USA
关键词
LARGE SYSTEM; SCALABILITY; COMPUTATIONAL RELIABILITY;
D O I
10.1109/24.87126
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The availability of large multiprocessor systems raises new issues in the design of highly fault-tolerant architectures. As the number of active processors in large systems increases, its computing power increases but so does the anticipated failure rate in the system. On the other hand, the hardware redundancy available in such systems is capable of providing, in addition to large amounts of computing power, improved reliability and graceful degradation. This paper analyzes the scalability of large degradable homogeneous multiprocessors. The objective is to assess the limitations, imposed by reliability considerations, on the number of processors. The analysis of the mean-time-to-failure and the mission-time shows that, for a given value of the coverage factor, there exists a value of the number of processors at which these measures are maximal. As the system size is increased beyond this value, the reliability of the system becomes a rapidly decreasing function of the number of processors. The measure of processor-hours is defined as the amount of potential computational work. This measure is upper-bounded, and the upper-bound is independent of the initial number of processors. For computations with linear speed-up, the amount of reliable computational work is constant for large system-sizes. When the speed-up is not linear, this amount is a decreasing function of the number of processors. Therefore, for large system-sizes and same technology, increasing the number of processors results in a decrease of the average amount of reliable computational work the system can deliver. Graceful degradation in large fault-tolerant systems is not scalable. In order to preserve the same performance and reliability level, an increase in the number of processors must be matched by a decrease of the same magnitude in the probability of failed recovery.
引用
收藏
页码:189 / 197
页数:9
相关论文
共 24 条
[1]  
Agrawal P., 1986, 13th Annual International Symposium on Computer Architecture (Cat. No.86CH2291-3), P65
[2]  
AGRAWAL P, 1985, 1985 P INT C PAR PRO, P814
[3]  
BEAUDRY MD, 1978, IEEE T COMPUT, V27, P540, DOI 10.1109/TC.1978.1675145
[4]   RELIABILITY MODELING FOR FAULT-TOLERANT COMPUTERS [J].
BOURICIUS, WG ;
CARTER, WC ;
JESSEP, DC ;
SCHNEIDER, PR ;
WADIA, AB .
IEEE TRANSACTIONS ON COMPUTERS, 1971, C 20 (11) :1306-+
[5]   ANALYSIS OF A COMPOSITE PERFORMANCE RELIABILITY MEASURE FOR FAULT-TOLERANT SYSTEMS [J].
DONATIELLO, L ;
IYER, BR .
JOURNAL OF THE ACM, 1987, 34 (01) :179-199
[6]  
FORTES JAB, 1985, IEEE T COMPUT, V34, P1033, DOI 10.1109/TC.1985.1676536
[7]  
FURCHTGOTT DG, 1984, IEEE T COMPUTERS, V33
[8]  
GOYAL A, 1987, IEEE T COMPUT, V36, P738, DOI 10.1109/TC.1987.1676966
[9]  
HOPKINS AL, 1978, OCT P IEEE, V66
[10]  
INGLE AD, 1977, 1977 P S FAULT TOL C, P3