SUFFIX ARRAYS - A NEW METHOD FOR ONLINE STRING SEARCHES

被引:976
作者
MANBER, U
MYERS, G
机构
[1] Univ of Arizona, Tucson, AZ
关键词
STRING SEARCHING; STRING MATCHING; PATTERN MATCHING; SUFFIX TREES; ALGORITHMS; TEXT INDEXING; INVERTED INDEXES;
D O I
10.1137/0222058
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A new and conceptually simple data structure, called a suffix array, for on-line string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit on-line string searches of the type, ''Is W a substring of A?'' to be answered in time O(P + log N), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in O(N) time in the worst case, versus O(N log N) time for suffix arrays. However, an augmented algorithm is given that, regardless of the alphabet size, constructs suffix arrays in O(N) expected time, albeit with lesser space efficiency. It is believed that suffix arrays will prove to be better in practice than suffix trees for many applications.
引用
收藏
页码:935 / 948
页数:14
相关论文
共 30 条
[1]   STRUCTURAL-PROPERTIES OF THE STRING STATISTICS PROBLEM [J].
APOSTOLICO, A ;
PREPARATA, FP .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1985, 31 (03) :394-411
[2]   PARALLEL CONSTRUCTION OF A SUFFIX TREE WITH APPLICATIONS [J].
APOSTOLICO, A ;
ILIOPOULOS, C ;
LANDAU, GM ;
SCHIEBER, B ;
VISHKIN, U .
ALGORITHMICA, 1988, 3 (03) :347-365
[3]   OPTIMAL OFF-LINE DETECTION OF REPETITIONS IN A STRING [J].
APOSTOLICO, A ;
PREPARATA, FP .
THEORETICAL COMPUTER SCIENCE, 1983, 22 (03) :297-315
[4]  
Apostolico A., 1985, NATO ASI SERIES F, V12, P85
[5]   IMPROVING QUICKSORT PERFORMANCE WITH A CODEWORD DATA STRUCTURE [J].
BAER, JL ;
LIN, YB .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1989, 15 (05) :622-631
[6]  
BAEZAYATES RA, 1990, ALL ALL SEQUENCE MAT
[7]   THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
EHRENFEUCHT, A ;
CHEN, MT ;
SEIFERAS, J .
THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) :31-55
[8]  
BLUMER J, 1986, UCSCCRL8611 U COL DE
[9]   ANALYSIS AND PERFORMANCE OF INVERTED DATA BASE STRUCTURES [J].
CARDENAS, AF .
COMMUNICATIONS OF THE ACM, 1975, 18 (05) :253-263
[10]  
CHANG WI, 1990, 31ST P S F COMP SCI, P116