Automatic analysis of malware behavior using machine learning

被引:378
作者
Rieck, Konrad [1 ]
Trinius, Philipp [2 ]
Willems, Carsten [2 ]
Holz, Thorsten [2 ,3 ]
机构
[1] Berlin Inst Technol, Berlin, Germany
[2] Univ Mannheim, Mannheim, Germany
[3] Vienna Univ Technol, Vienna, Austria
关键词
Malicious software; behavior-based analysis; clustering; classification;
D O I
10.3233/JCS-2010-0410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malicious software - so called malware - poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in the form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for the automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.
引用
收藏
页码:639 / 668
页数:30
相关论文
共 67 条
  • [51] Rieck K, 2010, J MACH LEARN RES, V11, P555
  • [52] PolyUnpack: Automating the hidden-code extraction of unpack-executing malware
    Royal, Paul
    Halpin, Mitch
    Dagon, David
    Edmonds, Robert
    Lee, Wenke
    [J]. 22ND ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, PROCEEDINGS, 2006, : 289 - +
  • [53] VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING
    SALTON, G
    WONG, A
    YANG, CS
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (11) : 613 - 620
  • [54] SCHULTZ MG, 2001, P IEEE S SEC PRIV OA
  • [55] Sharif M., 2009, P IEEE S SEC PRIV OA
  • [56] Shawe-Taylor J., 2004, KERNEL METHODS PATTE, DOI DOI 10.1017/CBO9780511809682
  • [57] SONG D, 2008, P 4 INT C INF SYST S
  • [58] Stolfo SJ, 2007, ADV INFORM SECUR, P231
  • [59] Symantec, 2008, INT SEC THREAT REP, VXIV
  • [60] Szor Peter, 2005, ART COMPUTER VIRUS R