A time and space efficient data structure for string searching on large texts

被引:18
作者
Colussi, L
DeCol, A
机构
[1] Dipto. di Matemat. Pura ed Applicata, Università di Padova, I-35131 Padova
关键词
algorithms; data structures; string matching;
D O I
10.1016/0020-0190(96)00061-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Suffix tree and suffix array are data structures that allow fast search in a large static text. By using the suffix tree data structure we can find all k occurrences of a pattern w in a text of length n in time O(\w\ + k). The same problem can be solved by using the suffix array data structure in time O(\w\ + log(n) + k), Thus suffix trees perform better than suffix arrays with respect to the search time. On the other hand suffix trees require as much as four times more memory space than suffix arrays. We propose a new data structure, the augmented suffix array, that allows searching in O(\w\ + log log(n) + k) time and requires about the same memory space as the suffix array. Moreover, in case of very large texts, most of the new data structure and the text itself can be stored in secondary memory without compromising search operation efficiency. This is not the case for both suffix trees and suffix arrays.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 8 条
[1]   EFFICIENT IMPLEMENTATION OF SUFFIX TREES [J].
ANDERSSON, A ;
NILSSON, S .
SOFTWARE-PRACTICE & EXPERIENCE, 1995, 25 (02) :129-141
[2]  
Apostolico A., 1985, COMBINATORIAL ALGORI, P85
[3]  
Crochemore M., 1994, TEXT ALGORITHMS
[4]  
KARKKAINEN J, 1995, P CPM 95, P262
[5]   SUFFIX ARRAYS - A NEW METHOD FOR ONLINE STRING SEARCHES [J].
MANBER, U ;
MYERS, G .
SIAM JOURNAL ON COMPUTING, 1993, 22 (05) :935-948
[6]   SPACE-ECONOMICAL SUFFIX TREE CONSTRUCTION ALGORITHM [J].
MCCREIGHT, EM .
JOURNAL OF THE ACM, 1976, 23 (02) :262-272
[7]  
Weiner P., 1973, 14th Annual Symposium on Switching Automata Theory, P1
[8]  
[No title captured]