Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation

被引:2
作者
Jiang, Peng [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
enumerative speculation; finite state machines; simd;
D O I
10.1145/3018743.3018760
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Finite State Machine (FSM) is the key kernel behind many popular applications, including regular expression matching, text tokenization, and Huffman decoding. Parallelizing FSMs is extremely difficult because of the strong dependencies and unpredictable memory accesses. Previous efforts have largely focused on multi-core parallelization, and used different approaches, including speculative and enumerative execution, both of which have been effective but also have limitations. With increasing width and improving flexibility in SIMD instruction sets, this paper focuses on combining SIMD and multi/many-core parallelism for FSMs. We have developed a novel strategy, called enumerative speculation. Instead of speculating on a single state as in speculative execution or enumerating all possible states as in enumerative execution, our strategy speculates transitions from several possible states, reducing the prediction overheads of speculation approach and the large amount of redundant work in the enumerative approach. A simple lookback approach produces a set of guessed states to achieve high speculation success rates in our enumerative speculation. We evaluate our method with four popular FSM applications: Huffman decoding, regular expression matching, HTML tokenization, and Div7. We obtain up to 2.5x speedup using SIMD on one core and up to 95x combining SIMD with 60 cores of an Intel Xeon Phi. On a single core, we outperform the best single-state speculative execution version by an average of 1.6x, and in combining SIMD and many-core parallelism, outperform enumerative execution by an average of 2x.
引用
收藏
页码:179 / 191
页数:13
相关论文
共 26 条
  • [1] [Anonymous], 1990, CMUCS90190 SCH COMP
  • [2] A View of the Parallel Computing Landscape
    Asanovic, Krste
    Bodik, Rastislav
    Demmel, James
    Keaveny, Tony
    Keutzer, Kurt
    Kubiatowicz, John
    Morgan, Nelson
    Patterson, David
    Sen, Koushik
    Wawrzynek, John
    Wessel, David
    Yelick, Katherine
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 56 - 67
  • [3] Chen Linchuan, 2016, P 2016 INT S COD GEN
  • [4] Franchetti F., 2002, PROC IEEE INT PARALL, P20
  • [5] Garca C., 2003, Proceedings of the 2003 IEEE Inernational Parallel and Distributed Processing Symposium (IPDPS 2003), p58a
  • [6] DATA PARALLEL ALGORITHMS
    HILLIS, WD
    STEELE, GL
    [J]. COMMUNICATIONS OF THE ACM, 1986, 29 (12) : 1170 - 1183
  • [7] Holub Jan, 2009, P 14 INT C IMPL APPL
  • [8] 2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3-D graphics computing
    Ide, N
    Hirano, M
    Endo, Y
    Yoshioka, S
    Murakami, H
    Kunimatsu, A
    Sato, T
    Kamei, T
    Okada, T
    Suzuoki, M
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2000, 35 (07) : 1025 - 1033
  • [9] Jiang P., 2016, P 2016 INT C SUP
  • [10] Khorasani F., 2014, P HPDC