An experimental study of a compressed index

被引:23
作者
Ferragina, P
Manzini, G
机构
[1] Univ Pisa, Dipartimento Informat, I-56125 Pisa, Italy
[2] Univ Piemonte Orientale, Dipartimento Sci & Tecnol Avanzate, Alessandria, Italy
[3] CNR, IMC, I-56100 Pisa, Italy
关键词
compressed full-text index; pattern matching on compressed text; Burrows-Wheeler transform; suffix array;
D O I
10.1016/S0020-0255(01)00098-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The size of electronic data is currently growing at a faster rate than computer memory and disk storage capacities. For this reason compression appears always as an attractive choice, if not mandatory. However, space overhead is not the only resource to be optimized when managing large data collections; in fact data turn out to be useful only when properly indexed to support search operations that efficiently extract the user-requested information. Approaches to combine compression and indexing techniques are nowadays receiving more and more attention. A first step towards the design of a compressed full-text index achieving guaranteed performance in the worst case has been recently done in [P. Ferragina, G. Manzini, in: Proceedings of the 41st IEEE Symposium on Foundations of Computer Science, Redondo Beach, CA, 2000, p. 390]. The novelty of the approach resides in the careful combination of the compression algorithm proposed by M. Burrows and D. Wheeler [A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994] with the suffix array data structure [U. Manber, G. Myers, SIAM J. Comput. 22 (5) (1993) p. 935]. The resulting index is opportunistic in that, although no assumption on a particular fixed distribution is made, it takes advantage of the compressibility of the input data by decreasing the space occupancy at no significant asymptotic slowdown in the query performance.
引用
收藏
页码:13 / 28
页数:16
相关论文
共 24 条
[1]   Let sleeping files lie: Pattern matching in Z-compressed files [J].
Amir, A ;
Benson, G ;
Farach, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (02) :299-307
[2]  
BaezaYates R, 2000, J AM SOC INFORM SCI, V51, P69, DOI 10.1002/(SICI)1097-4571(2000)51:1<69::AID-ASI10>3.0.CO
[3]  
2-C
[4]  
Bentley JL, 1997, PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P360
[5]   A LOCALLY ADAPTIVE DATA-COMPRESSION SCHEME [J].
BENTLEY, JL ;
SLEATOR, DD ;
TARJAN, RE ;
WEI, VK .
COMMUNICATIONS OF THE ACM, 1986, 29 (04) :320-330
[6]  
CROCHEMORE M, 1999, LECT NOTES COMPUTER, V1644, P261
[7]   String matching in Lempel-Ziv compressed strings [J].
Farach, M ;
Thorup, M .
ALGORITHMICA, 1998, 20 (04) :388-404
[8]   The Burrows-Wheeler transform for block sorting text compression: Principles and improvements [J].
Fenwick, PM .
COMPUTER JOURNAL, 1996, 39 (09) :731-740
[9]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[10]  
Gonnet G.H., 1992, INFORMATION RETRIEVA, P66