A hybrid machine learning approach to network anomaly detection

被引:264
作者
Shon, Taeshik
Moon, Jongsub
机构
[1] TN R&D Ctr, IP Lab, Suwon 442600, South Korea
[2] Korea Univ, CIST GSIS, Seoul 136701, South Korea
关键词
network security; machine learning; anomaly attack; pattern recognition; genetic algorithm; intrusion detection;
D O I
10.1016/j.ins.2007.03.025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Zero-day cyber attacks such as worms and spy-ware are becoming increasingly widespread and dangerous. The existing signature-based intrusion detection mechanisms are often not sufficient in detecting these types of attacks. As a result, anomaly intrusion detection methods have been developed to cope with such attacks. Among the variety of anomaly detection approaches, the Support Vector Machine (SVM) is known to be one of the best machine learning algorithms to classify abnormal behaviors. The soft-margin SVM is one of the well-known basic SVM methods using supervised learning. However, it is not appropriate to use the soft-margin SVM method for detecting novel attacks in Internet traffic since it requires pre-acquired learning information for supervised learning procedure. Such pre-acquired learning information is divided into normal and attack traffic with labels separately. Furthermore, we apply the one-class SVM approach using unsupervised learning for detecting anomalies. This means one-class SVM does not require the labeled information. However, there is downside to using one-class SVM: it is difficult to use the one-class SVM in the real world, due to its high false positive rate. In this paper, we propose a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach. We use the following additional techniques to improve the performance of the proposed approach (referred to as Anomaly Detector using Enhanced SVM): First, we create a profile of normal packets using Self-Organized Feature Map (SOFM), for SVM learning without pre-existing knowledge. Second, we use a packet filtering scheme based on Passive TCP/IP Fingerprinting (PTF), in order to reject incomplete network traffic that either violates the TCP/IP standard or generation policy inside of well-known platforms. Third, a feature selection technique using a Genetic Algorithm (GA) is used for extracting optimized information from raw internet packets. Fourth, we use the flow of packets based on temporal relationships during data preprocessing, for considering the temporal relationships among the inputs used in SVM learning. Lastly, we demonstrate the effectiveness of the Enhanced SVM approach using the above-mentioned techniques, such as SOFM, PTF, and GA on MIT Lincoln Lab datasets, and a live dataset captured from a real network. The experimental results are verified by ni-fold cross validation, and the proposed approach is compared with real world Network Intrusion Detection Systems (NIDS). (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:3799 / 3821
页数:23
相关论文
共 43 条
[1]  
ANDERSON D, 1995, SRICSL9506 SRI INT C
[2]  
[Anonymous], 1995, Machine Learning, DOI DOI 10.1023/A:1022627411411
[3]  
[Anonymous], P WORKSH MULT SEC AC
[4]  
[Anonymous], 1995, SELF ORG MAP
[5]  
[Anonymous], 1997, INTRO GENETIC ALGORI
[6]  
ARKIN O, 2004, XPROBE
[7]  
Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
[8]   A survey on pattern recognition applications of support vector machines [J].
Byun, H ;
Lee, SW .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2003, 17 (03) :459-486
[9]   Statistical traffic modeling for network intrusion detection [J].
Cabrera, JBD ;
Ravichandran, B ;
Mehra, RK .
8TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2000, :466-473
[10]  
*CERT COORD CTR, 2001, DEN SERV ATT CARN ME