The String Edit Distance Matching Problem With Moves

被引:83
作者
Cormode, Graham [1 ,3 ]
Muthukrishnan, S. [2 ,4 ]
机构
[1] AT&T Labs Res, 180 Pk Ave, Florham Pk, NJ 07932 USA
[2] Rutgers State Univ, Dept Comp & Informat Sci, Piscataway, NJ 08854 USA
[3] Univ Warwick, Coventry, W Midlands, England
[4] AT&T Res, Florham Pk, NJ USA
关键词
Approximate pattern matching; data streams; edit distance; embedding; similarity search; string matching;
D O I
10.1145/1186810.1186812
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L-1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).
引用
收藏
页数:19
相关论文
共 29 条
[1]   GENERALIZED STRING MATCHING [J].
ABRAHAMSON, K .
SIAM JOURNAL ON COMPUTING, 1987, 16 (06) :1039-1051
[2]  
Alstrup S, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P819
[3]  
Amir A, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P794
[4]  
ANDERSON RJ, 1991, ALGORITHMICA, V6, P859, DOI 10.1007/BF01759076
[5]  
Cole R., 1986, P 18 ANN S THEOR COM, P206, DOI [10.1145/12130.12151, DOI 10.1145/12130.12151]
[6]  
CORMODE G., 2000, P ACM S THEOR COMP, P315
[7]  
Crochemore M., 1994, TEXT ALGORITHMS
[8]  
Goel A, 2001, SIAM PROC S, P769
[9]  
Goldberg Andrew, 1987, P S THEOR COMP STOC, P315
[10]  
Gusfield D., 1997, ALGORITHMS STRINGS T