Space efficient linear time construction of suffix arrays

被引:93
作者
Ko, Pang [1 ]
Aluru, Srinivas [1 ,2 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
[2] Iowa State Univ, Laurence H Baker Ctr Bioinformat & Biol Stat, Ames, IA 50011 USA
关键词
Computational biology; Pattern matching; String algorithms; Suffix array; Suffix sorting;
D O I
10.1016/j.jda.2004.08.002
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(n log n) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:143 / 156
页数:14
相关论文
共 20 条
[1]  
Abouelhoda M. I., 2002, String Processing and Information Retrieval. 9th International Symposium, SPIRE 2002. Proceedings (Lecture Notes in Computer Science Vol.2476), P31
[2]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[3]  
Abouelhoda MI, 2002, LECT NOTES COMPUT SC, V2452, P449
[4]  
Bentley JL, 1997, PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P360
[5]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[6]  
Farach M., 1996, P 23 INT C AUT LANG
[7]  
Gonnet G. H., 1992, INFORMATION RETRIEVA, V66, P82
[8]  
Gusfield D., 1997, ALGORITHMS STRINGS T
[9]  
Hon WK, 2003, ANN IEEE SYMP FOUND, P251
[10]  
Itoh H., 1999, INT S STRING PROC IN, P81