Algorithms to accelerate multiple regular expressions matching for deep packet inspection

被引:264
作者
Kumar, Sailesh [1 ]
Dharmapurikar, Sarang
Yu, Fang
Crowley, Patrick
Turner, Jonathan
机构
[1] Washington Univ, St Louis, MO 63130 USA
[2] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA 94720 USA
关键词
algorithms; design; security; DFA; regular expressions; deep packet inspection;
D O I
10.1145/1151659.1159952
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. However, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application. In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D 2 FA), which substantially reduces space requirements as compared to a DFA. AD 2 FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D 2 FA representation reduces transitions by more than 95%. Given the substantially reduced space requirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide cost-effective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.
引用
收藏
页码:339 / 350
页数:12
相关论文
共 31 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] ANTONATOS S, 2004, ACM WORKSH SOFTW PER
  • [3] Baker ZK, 2004, LECT NOTES COMPUT SC, V3203, P311
  • [4] Deep packet filter with dedicated logic and read only memories
    Cho, YH
    Mangione-Smith, WH
    [J]. 12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 125 - 134
  • [5] CLARK CR, P 13 INT C FIELD PRO
  • [6] COMMENTZWALTER B, 1979, P 6 INT C AUT LANG P, P118
  • [7] DHARMAPURIKAR S, 2003, IEEE HOT INT 12 AUG
  • [8] Eatherton W., ENCODED VERSION REG
  • [9] THE COMPILATION OF REGULAR EXPRESSIONS INTO INTEGRATED-CIRCUITS
    FLOYD, RW
    ULLMAN, JD
    [J]. JOURNAL OF THE ACM, 1982, 29 (03) : 603 - 622
  • [10] GAREY MR, 1979, COMPUTERS INTRACTABI, P208