Learning to Detect Malicious URLs

被引:131
作者
Ma, Justin [1 ]
Saul, Lawrence K. [2 ]
Savage, Stefan [2 ]
Voelker, Geoffrey M. [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Univ Calif San Diego, San Diego, CA 92103 USA
基金
美国国家科学基金会;
关键词
Algorithms; Security; Online learning; malicious Web sites; PERCEPTRON;
D O I
10.1145/1961189.1961202
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect end-users from visiting them. This article explores how to detect malicious Web sites from the lexical and host-based features of their URLs. We show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. We develop a real-time system for gathering URL features and pair it with a real-time feed of labeled URLs from a large Web mail provider. From these features and labels, we are able to train an online classifier that detects malicious Web sites with 99% accuracy over a balanced dataset.
引用
收藏
页数:24
相关论文
共 44 条
  • [1] [Anonymous], SPAM PLUMMETS CALIF
  • [2] [Anonymous], P ACM C INF KNOWL MA
  • [3] [Anonymous], P USENIX WORKSH LARG
  • [4] [Anonymous], P SIGKDD C
  • [5] [Anonymous], P INT C MACH LEARN I
  • [6] [Anonymous], P S NETW DISTR SYST
  • [7] [Anonymous], 2009, ADV NEURAL INFORM PR
  • [8] [Anonymous], 1034 RFC
  • [9] [Anonymous], 12 MAAWG
  • [10] [Anonymous], 2008, P INT C MAL UNW SOFT