A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

被引:439
作者
Williams, Nigel [1 ]
Zander, Sebastian [1 ]
Armitage, Grenville [1 ]
机构
[1] Swinburne Univ Technol, CAIA, Melbourne, Vic, Australia
关键词
algorithms; measurement; traffic classification; machine learning;
D O I
10.1145/1163593.1163596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The identification of network applications through observation of associated packet traffic flows is vital to the areas of network management and surveillance. Currently popular methods such as port number and payload-based identification exhibit a number of shortfalls. An alternative is to use machine learning (ML) techniques and identify network applications based on per-flow statistics, derived from payload-independent features such as packet length and inter-arrival time distributions. The performance impact of feature set reduction, using Consistency-based and Correlation-based feature selection, is demonstrated on Naive Bayes, C4.5, Bayesian Network and Naive Bayes Tree algorithms. We then show that it is useful to differentiate algorithms based on computational performance rather than classification accuracy alone, as although classification accuracy between the algorithms is similar, computational performance can differ significantly.
引用
收藏
页码:7 / 15
页数:9
相关论文
共 16 条
  • [1] [Anonymous], THESIS WAIKATO U HAM
  • [2] BOUCKAERT R, 2002, BAYESIAN NETWORK CLA
  • [3] Brownlee N., 1999, NETRAMET NEMAC REFER
  • [4] Consistency-based search in feature selection
    Dash, M
    Liu, HA
    [J]. ARTIFICIAL INTELLIGENCE, 2003, 151 (1-2) : 155 - 176
  • [5] DUNNIGAN T, 2000, FLOW CHARACTERIZATIO
  • [6] John G.H., 1995, Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, DOI [10.1109/TGRS.2004.834800, DOI 10.1109/TGRS.2004.834800, 10.5555/2074158.2074196, DOI 10.5555/2074158.2074196]
  • [7] KARAGIANNIS T, 2005, P ACM SIGCOMM COMPUT
  • [8] KARATIANNIS A, 2004, P GLOBECOM NOV DEC
  • [9] Kohavi R., 1996, P 2 INT C KNOWL DISC, V96, P1
  • [10] Kohavi R, 2002, Handbook of data mining and knowledge discovery, P267