Fuzziness based semi-supervised learning approach for intrusion detection system

被引:354
作者
Ashfaq, Rana Aamir Raza [1 ]
Wang, Xi-Zhao [1 ]
Huang, Joshua Zhexue [1 ]
Abbas, Haider [2 ]
He, Yu-Lin [1 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China
[2] King Saud Univ, Riyadh, Saudi Arabia
基金
中国博士后科学基金;
关键词
Fuzziness; Divide-and-conquer strategy; Semi-supervised learning; Intrusion detection; Random weight neural network; NEURAL-NETWORKS; CLASSIFICATION; REGRESSION; ALGORITHM; ENSEMBLES; WEIGHTS;
D O I
10.1016/j.ins.2016.04.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Countering cyber threats, especially attack detection, is a challenging area of research in the field of information assurance. Intruders use polymorphic mechanisms to masquerade the attack payload and evade the detection techniques Many supervised and unsupervised learning approaches from the field of machine learning and pattern recognition have been used to increase the efficacy of intrusion detection systems (IDSs). Supervised learning approaches use only labeled samples to train a classifier, but obtaining sufficient labeled samples is cumbersome, and requires the efforts of domain experts. However, unlabeled samples can easily be obtained in many real world problems. Compared to supervised learning approaches, semi-supervised learning (SSL) addresses this issue by considering large amount of unlabeled samples together with the labeled samples to build a better classifier. This paper proposes a novel fuzziness based semi-supervised learning approach by utilizing unlabeled samples assisted with supervised learning algorithm to improve the classifier's performance for the IDSs. A single hidden layer feed-forward neural network (SLFN) is trained to output a fuzzy membership vector, and the sample categorization (low, mid, and high fuzziness categories) on unlabeled samples is performed using the fuzzy quantity. The classifier is retrained after incorporating each category separately into the original training set. The experimental results using this technique of intrusion detection on the NSL-KDD dataset show that unlabeled samples belonging to low and high fuzziness groups make major contributions to improve the classifier's performance compared to existing classifiers e.g., naive bayes, support vector machine, random forests, etc. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:484 / 497
页数:14
相关论文
共 62 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]   Fast decorrelated neural network ensembles with random weights [J].
Alhamdoosh, Monther ;
Wang, Dianhui .
INFORMATION SCIENCES, 2014, 264 :104-117
[3]  
[Anonymous], 2009, P 2009 IEEE S COMP I
[4]  
[Anonymous], 2005, 1530 U WISC
[5]  
[Anonymous], 1990, Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs
[6]  
[Anonymous], 2003, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4, CONLL'03
[7]  
[Anonymous], 1995, ACL, DOI 10.3115/981658.981684
[8]  
[Anonymous], 2006, Advances in Neural Information Processing Systems
[9]   Introduction to semi-supervised learning [J].
Goldberg, Xiaojin .
Synthesis Lectures on Artificial Intelligence and Machine Learning, 2009, 6 :1-116
[10]  
[Anonymous], 2000, P 9 INT C INF KNOWL