A guided tour to approximate string matching

被引:1644
作者
Navarro, G [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
algorithms; edit distance; Levenshtein distance; online string matching; text searching allowing errors;
D O I
10.1145/375360.375365
中图分类号
TP301 [理论、方法];
学科分类号
081202 [计算机软件与理论];
摘要
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.
引用
收藏
页码:31 / 88
页数:58
相关论文
共 158 条
[1]
Aho A.V., 1974, The Design and Analysis of Computer Algorithms
[2]
EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[3]
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]
Pattern matching with swaps [J].
Amir, A ;
Aumann, Y ;
Landau, GM ;
Lewenstein, M ;
Lewenstein, N .
38TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1997, :144-153
[5]
Amir A, 1997, LECT NOTES COMPUT SC, V1272, P160
[6]
[Anonymous], 1975, J APPL PROBAB
[7]
THE LONGEST COMMON SUBSEQUENCE PROBLEM REVISITED [J].
APOSTOLICO, A ;
GUERRA, C .
ALGORITHMICA, 1987, 2 (03) :315-336
[8]
Apostolico A, 1997, Pattern Matching Algorithms
[9]
Apostolico A., 1985, COMBINATORIAL ALGORI, P85
[10]
Arlazarov V. L., 1970, SOV MATH DOKL, V11, P1209