SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS

被引：97

作者：

CHANG, WI ^{[1
]}

LAWLER, EL ^{[1
]}

机构：

[1] UNIV CALIF BERKELEY, DIV COMP SCI, BERKELEY, CA 94720 USA

来源：

ALGORITHMICA | 1994年 / 12卷 / 4-5期

关键词：

PATTERN MATCHING; EDIT DISTANCE; SUFFIX TREE; LOWEST COMMON ANCESTOR; CHERNOFF BOUND; COMPUTATIONAL BIOLOGY;

D O I：

10.1007/BF01185431

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Given a text string of length n and a pattern string of length m over a b-letter alphabet, the k differences approximate string matching problem asks for all locations in the text where the pattern occurs with at most k differences (substitutions, insertions, deletions). We treat k not as a constant but as a fraction of m (not necessarily constant-fraction). Previous algorithms require at least 0(kn) time (or exponential space). We give an algorithm that is sublinear time O((n/m)k log(b) m) when the text is random and k is bounded by the threshold m/(log(b) m + 0(1)). In particular, when k = o(m/log(b) m) the expected running time is o(n). In the worst case our algorithm is 0(kn), but is still an improvement in that it is practical and uses 0(m) space compared with 0(n) or 0(m2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.

引用

页码：327 / 344

页数：18

共 51 条

[1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
AHO, AV
CORASICK, MJ
[J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
[2] BASIC LOCAL ALIGNMENT SEARCH TOOL
ALTSCHUL, SF
GISH, W
MILLER, W
MYERS, EW
LIPMAN, DJ
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
[3] Apostolico A., 1985, COMBINATORIAL ALGORI, P85, DOI DOI 10.1007/978-3-642-82456-2_6
[4] CHANG WI, 1990, ANN IEEE SYMP FOUND, P116
[5] CHANG WI, 1992, LECT NOTES COMPUT SC, V644, P175
[6] CHANG WI, 1994, IN PRESS LECTURE NOT
[7] CHANG WI, 1991, THESIS U CALIFORNIA
[8] CHANG WI, 1990, FAST IMPLEMENTATION
[9] CHANG WI, UCBCSD91653654 COMP
[10] CHANG WI, 1990, HUMAN GENOME 2 OFFIC, P24

← 1 2 3 4 5 6 →