An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation

被引:29
作者
C. S. Chao
D. L. Yang
A. C. Liu
机构
[1] Feng Chia University,Department of Information Engineering
[2] Taichung,undefined
关键词
Network management; fault isolation; automated network fault diagnosis; alarm correlation; hierarchical domain-oriented delegated reasoning;
D O I
10.1023/A:1011315125608
中图分类号
学科分类号
摘要
The increasing importance of computer networks in this information age demands a high level of network availability and reliability. As we become more dependent on networks in our so-called cyber-world, network faults and downtime become very costly. Sometimes, a slight fault may cause critical disruptions or remediless damages to the network while the network manager is lost among a large amount of alarm messages. Therefore, the development of a practical and effective system for network fault diagnosis becomes an imperative and critical task. In this paper, we develop a hierarchical domain-oriented reasoning mechanism suitable for the delegated management architecture. It is based on the causality graph of a refined network fault propagation model as a result of our empirical study. An automated fault diagnosis system called Alarm Correlation View (or ACView) for isolating network faults in a multi-domain environment is proposed according to the hierarchical reasoning mechanism. This diagnosis system not only provides the process of automated alarm collection and correlation, but also serves the function of efficient fault localization and identification. Furthermore, an alarm-to-fault mapping strategy is used to enhance the fault reasoning capability for uncertain network fault propagation.
引用
收藏
页码:183 / 202
页数:19
相关论文
共 12 条
[1]  
Bouloutas A. T.(1995)Distributed fault identification in telecommunication networks Journal of Network and Systems Management 3 295-312
[2]  
Calo S. B.(1995)Schemes for fault identification in communication networks IEEE/ACM Trans. on Networking 3 753-764
[3]  
Finkel A.(1996)A fuzzy expert system for network fault management Proceedings of IEEE International Conference on Systems, Maintenance, and Cybernetics 1 328-331
[4]  
Katzela I.(1997)A generic model for fault isolation in integrated management systems Journal of Network and Systems Management 5 109-130
[5]  
Katzela I.(1990)A case study of Ethernet anomalies in a distributed computing environment IEEE Trans. on Reliability 39 433-443
[6]  
Schwartz M.(undefined)undefined undefined undefined undefined-undefined
[7]  
Chen J. L.(undefined)undefined undefined undefined undefined-undefined
[8]  
Huang P. H.(undefined)undefined undefined undefined undefined-undefined
[9]  
K¨atker S.(undefined)undefined undefined undefined undefined-undefined
[10]  
Geihs K.(undefined)undefined undefined undefined undefined-undefined