APPROXIMATE STRING-MATCHING WITH SUFFIX AUTOMATA

被引:40
作者
UKKONEN, E [1 ]
WOOD, D [1 ]
机构
[1] UNIV WATERLOO,DEPT COMP SCI,WATERLOO N2L 3G1,ONTARIO,CANADA
关键词
APPROXIMATE STRING MATCHING; EDIT DISTANCE; DYNAMIC PROGRAMMING; SUFFIX AUTOMATON; AHO-CORASICK AUTOMATON;
D O I
10.1007/BF01769703
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The approximate string matching problem is, given a text string, a pattern string, and an integer k, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at most k from the pattern. We give a new O(kn) algorithm for this problem, where n is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported.
引用
收藏
页码:353 / 364
页数:12
相关论文
共 16 条
[1]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[2]   THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
EHRENFEUCHT, A ;
CHEN, MT ;
SEIFERAS, J .
THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) :31-55
[3]   TRANSDUCERS AND REPETITIONS [J].
CROCHEMORE, M .
THEORETICAL COMPUTER SCIENCE, 1986, 45 (01) :63-86
[4]  
CROCHEMORE M, 1988, LECT NOTES COMPUT SC, V324, P44, DOI 10.1007/BFb0017130
[5]   AN IMPROVED ALGORITHM FOR APPROXIMATE STRING MATCHING [J].
GALIL, Z ;
PARK, K .
SIAM JOURNAL ON COMPUTING, 1990, 19 (06) :989-999
[6]  
Galil Z., 1988, Journal of Complexity, V4, P33, DOI 10.1016/0885-064X(88)90008-8
[7]   TIME OPTIMAL LEFT-TO-RIGHT CONSTRUCTION OF POSITION TREES [J].
KEMPF, M ;
BAYER, R ;
GUNTZER, U .
ACTA INFORMATICA, 1987, 24 (04) :461-474
[8]  
LANDAU G, 26TH P FOCS, P126
[9]  
LANDAU G, 18TH STOC P, P220
[10]   FAST PARALLEL AND SERIAL APPROXIMATE STRING MATCHING [J].
LANDAU, GM ;
VISHKIN, U .
JOURNAL OF ALGORITHMS, 1989, 10 (02) :157-169