Lempel-Ziv index for q-grams

被引:18
作者
Karkkainen, J [1 ]
Sutinen, E [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
关键词
q-gram index; approximate pattern matching; text indexing; Lempel-Ziv parsing; string algorithms; data compression;
D O I
10.1007/PL00009205
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new sublinear-size index structure for finding all occurrences of a given q-gram in a text. Such a q-gram index is needed in many approximate pattern matching algorithms. All earlier q-gram indexes require at least O(n) space, where n is the length of the text. The new Lempel-Ziv index needs only O(n/log n) space while being as fast as previous methods. The new method takes advantage of repetitions in the text found by Lempel-Ziv parsing.
引用
收藏
页码:137 / 154
页数:18
相关论文
共 24 条
[1]  
BAEZAYATES R, 1993, P 19 LAT C INF BUEN, P15
[2]   FAST STRING SEARCHING ALGORITHM [J].
BOYER, RS ;
MOORE, JS .
COMMUNICATIONS OF THE ACM, 1977, 20 (10) :762-772
[3]  
Califano A, 1993, Proc Int Conf Intell Syst Mol Biol, V1, P56
[4]  
CHANG W, 1994, LECT NOTES COMPUT SC, V807, P259
[5]   SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS [J].
CHANG, WI ;
LAWLER, EL .
ALGORITHMICA, 1994, 12 (4-5) :327-344
[6]   SIMPLE AND EFFICIENT STRING MATCHING WITH K MISMATCHES [J].
GROSSI, R ;
LUCCIO, F .
INFORMATION PROCESSING LETTERS, 1989, 33 (03) :113-120
[7]  
Karkkainen J., 1996, Proceedings of the Third South American Workshop on String Processing. WSP 1996, P141
[8]   FAST STRING MATCHING WITH K-DIFFERENCES [J].
LANDAU, GM ;
VISHKIN, U .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1988, 37 (01) :63-78
[9]  
Lehtinen O., 1996, Proceedings of the Third South American Workshop on String Processing. WSP 1996, P183
[10]   COMPLEXITY OF FINITE SEQUENCES [J].
LEMPEL, A ;
ZIV, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) :75-81