Efficient implementation of lazy suffix trees

被引:43
作者
Giegerich, R
Kurtz, S
Stoye, J
机构
[1] Univ Bielefeld, AG Genominformat, Fac Technol, D-33594 Bielefeld, Germany
[2] Univ Hamburg, Ctr Bioinformat, D-20146 Hamburg, Germany
关键词
string matching; suffix tree; space-efficient implementation; lazy evaluation; CONSTRUCTION; TIME;
D O I
10.1002/spe.535
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:1035 / 1049
页数:15
相关论文
共 32 条
[21]  
Karkkainen J, 1995, LECT NOTES COMPUT SC, V937, P191
[22]   REPuter: the manifold applications of repeat analysis on a genomic scale [J].
Kurtz, S ;
Choudhuri, JV ;
Ohlebusch, E ;
Schleiermacher, C ;
Stoye, J ;
Giegerich, R .
NUCLEIC ACIDS RESEARCH, 2001, 29 (22) :4633-4642
[23]  
Kurtz S, 1999, SOFTWARE PRACT EXPER, V29, P1149, DOI 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO
[24]  
2-O
[25]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[26]   SUFFIX ARRAYS - A NEW METHOD FOR ONLINE STRING SEARCHES [J].
MANBER, U ;
MYERS, G .
SIAM JOURNAL ON COMPUTING, 1993, 22 (05) :935-948
[27]   AN EFFICIENT METHOD FOR FINDING REPEATS IN MOLECULAR SEQUENCES [J].
MARTINEZ, HM .
NUCLEIC ACIDS RESEARCH, 1983, 11 (13) :4629-4634
[28]   SPACE-ECONOMICAL SUFFIX TREE CONSTRUCTION ALGORITHM [J].
MCCREIGHT, EM .
JOURNAL OF THE ACM, 1976, 23 (02) :262-272
[29]   Space efficient suffix trees [J].
Munro, JL ;
Raman, V ;
Rao, SS .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2001, 39 (02) :205-222
[30]  
SKIENA SS, 1998, P 2 WORKSH ALG ENG W, P204