A randomized algorithm for approximate string matching

被引:28
作者
Atallah, MJ [1 ]
Chyzak, F
Dumas, P
机构
[1] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
关键词
convolution; FFT; approximate string matching; randomized algorithms;
D O I
10.1007/s004530010062
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions.
引用
收藏
页码:468 / 486
页数:19
相关论文
共 11 条
[1]   GENERALIZED STRING MATCHING [J].
ABRAHAMSON, K .
SIAM JOURNAL ON COMPUTING, 1987, 16 (06) :1039-1051
[2]  
ATALLAH MJ, 1992, LECT NOTES COMPUT SC, V644, P27
[3]  
Atallah MJ, 1996, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL II, P349, DOI 10.1109/ICIP.1996.560828
[4]   A NEW APPROACH TO TEXT SEARCHING [J].
BAEZAYATES, R ;
GONNET, GH .
COMMUNICATIONS OF THE ACM, 1992, 35 (10) :74-82
[5]   Fast and practical approximate string matching [J].
BaezaYates, RA ;
Perleberg, CH .
INFORMATION PROCESSING LETTERS, 1996, 59 (01) :21-27
[6]  
Crochemore M., 1994, TEXT ALGORITHMS
[7]  
Fischer Michael J., 1974, SIAM AMS P, V7, P113
[8]   FAST ALGORITHMS FOR APPROXIMATELY COUNTING MISMATCHES [J].
KARLOFF, H .
INFORMATION PROCESSING LETTERS, 1993, 48 (02) :53-60
[9]  
Knuth DE, 1981, ART COMPUTER PROGRAM, V2
[10]  
Kumar Sandeep., 1994, P 17 NATL COMPUTER S, P11