Suffix trees on words

被引:21
作者
Andersson, A [1 ]
Larsson, NJ [1 ]
Swanson, K [1 ]
机构
[1] Lund Univ, Dept Comp Sci, S-22100 Lund, Sweden
关键词
suffix trees; substring searching;
D O I
10.1007/PL00009260
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters, whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(nl) construction space is allowed. We solve this problem, presenting an algorithm with O(rt) expected running time, in general, construction cost is Omega (n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the II characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.
引用
收藏
页码:246 / 260
页数:15
相关论文
共 18 条
[1]  
Andersson A., 1995, Proceedings of the Twenty-Seventh Annual ACM Symposium on the Theory of Computing, P427, DOI 10.1145/225058.225173
[2]   Faster deterministic sorting and searching in linear space [J].
Andersson, A .
37TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1996, :135-141
[3]   IMPROVED BEHAVIOR OF TRIES BY ADAPTIVE BRANCHING [J].
ANDERSSON, A ;
NILSSON, S .
INFORMATION PROCESSING LETTERS, 1993, 46 (06) :295-300
[4]   EFFICIENT IMPLEMENTATION OF SUFFIX TREES [J].
ANDERSSON, A ;
NILSSON, S .
SOFTWARE-PRACTICE & EXPERIENCE, 1995, 25 (02) :129-141
[5]  
ANDERSSON A, 1994, P 2 ANN EUR S ALG, V855
[6]  
Apostolico A., 1985, NATO ASI Series, V12, P85, DOI [DOI 10.1007/978-3-642-82456-2_6, 10.1007/978-3-642-82456-26, DOI 10.1007/978-3-642-82456-26]
[7]  
BAEZAYATES RA, 1989, LECT NOTES COMPUT SC, V372, P46
[8]   DYNAMIC PERFECT HASHING - UPPER AND LOWER BOUNDS [J].
DIETZFELBINGER, M ;
KARLIN, A ;
MEHLHORN, K ;
AUFDERHEIDE, FM ;
ROHNERT, H ;
TARJAN, RE .
SIAM JOURNAL ON COMPUTING, 1994, 23 (04) :738-761
[9]   Optimal suffix tree construction with large alphabets [J].
Farach, M .
38TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1997, :137-143
[10]   SURPASSING THE INFORMATION-THEORETIC BOUND WITH FUSION TREES [J].
FREDMAN, ML ;
WILLARD, DE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1993, 47 (03) :424-436