SeLeCT: Self-Learning Classifier for Internet Traffic

被引:29
作者
Grimaudo, Luigi [1 ]
Mellia, Marco [1 ]
Baralis, Elena [2 ]
Keralapura, Ram [3 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
[3] Narus Inc, Sunnyvale, CA USA
来源
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT | 2014年 / 11卷 / 02期
关键词
Traffic classification; clustering; self-seeding; unsupervised machine learning;
D O I
10.1109/TNSM.2014.011714.130505
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Network visibility is a critical part of traffic engineering, network management, and security. The most popular current solutions - Deep Packet Inspection (DPI) and statistical classification, deeply rely on the availability of a training set. Besides the cumbersome need to regularly update the signatures, their visibility is limited to classes the classifier has been trained for. Unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. To address the above issues, we propose SeLeCT, a Self-Learning Classifier for Internet Traffic. It uses unsupervised algorithms along with an adaptive seeding approach to automatically let classes of traffic emerge, being identified and labeled. Unlike traditional classifiers, it requires neither a-priori knowledge of signatures nor a training set to extract the signatures. Instead, SeLeCT automatically groups flows into pure (or homogeneous) clusters using simple statistical features. SeLeCT simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. Furthermore, SeLeCT uses an iterative seeding approach to boost its ability to cope with new protocols and applications. We evaluate the performance of SeLeCT using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that SeLeCT achieves excellent precision and recall, with overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of SeLeCT is its ability to discover new protocols and applications in an almost automated fashion.
引用
收藏
页码:144 / 157
页数:14
相关论文
共 23 条
[1]  
[Anonymous], 2006, Introduction to Data Mining
[2]  
Bernaille L., 2006 ACM CONEXT
[3]  
Casas P., 2013 INT TEL C
[4]  
Casas P., 2011 INT TEL C
[5]   Clustering unlabeled data with SOMs improves classification of labeled real-world data [J].
Dara, R ;
Kremer, SC ;
Stacey, DA .
PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :2237-2242
[6]  
Demiriz A., 1999 ANNIE
[7]  
Erman J., 2006 IEEE GLOBECOM
[8]  
Erman J., 2006 ACM SIGCOMM
[9]   Offline/realtime traffic classification using semi-supervised learning [J].
Erman, Jeffrey ;
Mahanti, Anirban ;
Arlitt, Martin ;
Cohen, Ira ;
Williamson, Carey .
PERFORMANCE EVALUATION, 2007, 64 (9-12) :1194-1213
[10]   Experiences of Internet Traffic Monitoring with Tstat [J].
Finamore, Alessandro ;
Mellia, Marco ;
Meo, Michela ;
Munafo, Maurizio M. ;
Rossi, Dario .
IEEE NETWORK, 2011, 25 (03) :8-14