NR-grep: a fast and flexible pattern-matching tool

被引:58
作者
Navarro, G [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
online string matching; regular expression searching; approximate string matching; grep; agrep; BNDM;
D O I
10.1002/spe.411
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present nrgrep ('non-deterministic reverse grep'), a new pattern-matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a non-deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept that is fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string-matching tools for the simplest patterns, and is by far unmatched for more complex patterns. Copyright (C) 2001 John Wiley & Sons, Ltd.
引用
收藏
页码:1265 / 1312
页数:48
相关论文
共 29 条
[1]  
Baeza-yates R. A., 1992, 12 IFIP WORLD COMP C, VI, P465
[2]   Faster approximate string matching [J].
BaezaYates, R ;
Navarro, G .
ALGORITHMICA, 1999, 23 (02) :127-158
[3]   A NEW APPROACH TO TEXT SEARCHING [J].
BAEZAYATES, R ;
GONNET, GH .
COMMUNICATIONS OF THE ACM, 1992, 35 (10) :74-82
[4]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[5]   FROM REGULAR EXPRESSIONS TO DETERMINISTIC AUTOMATA [J].
BERRY, G ;
SETHI, R .
THEORETICAL COMPUTER SCIENCE, 1986, 48 (01) :117-126
[6]   FAST STRING SEARCHING ALGORITHM [J].
BOYER, RS ;
MOORE, JS .
COMMUNICATIONS OF THE ACM, 1977, 20 (10) :762-772
[7]  
COMMENTZWALTER B, 1979, LECTURE NOTES COMPUT, V71, P118
[8]   SPEEDING-UP 2 STRING-MATCHING ALGORITHMS [J].
CROCHEMORE, M ;
CZUMAJ, A ;
GASIENIEC, L ;
JAROMINEK, S ;
LECROQ, T ;
PLANDOWSKI, W ;
RYTTER, W .
ALGORITHMICA, 1994, 12 (4-5) :247-267
[9]  
El-Mabrouk N., 1996, LNCS, V1075, P24
[10]  
Harman D., 1995, P 3 TEXT RETR C TREC, V500- 207, P1