Opportunistic data structures with applications

被引:632
作者
Ferragina, P [1 ]
Manzini, G [1 ]
机构
[1] Univ Pisa, Dipartimento Informat, I-56100 Pisa, Italy
来源
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS | 2000年
关键词
D O I
10.1109/SFCS.2000.892127
中图分类号
TP301 [理论、方法];
学科分类号
081202 [计算机软件与理论];
摘要
In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because a text T[1, u] is stored using O(H-k(T)) + o(1) bits per input symbol in the worst case, where H-k(T) is the kth order empirical entropy of T (the bound holds for any fixed Ic). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occ occurrences of P in T in O(p + occlog(epsilon) u) time (for any fixed epsilon > 0). if data are uncompressible we achieve the best space bound currently known [12]; on compressible data our solution improves the succinct suffix array of [12] and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool [19]. The result is an indexing tool which achieves sublinear space and sublinear query time complexity.
引用
收藏
页码:390 / 398
页数:9
相关论文
共 29 条
[1]
Let sleeping files lie: Pattern matching in Z-compressed files [J].
Amir, A ;
Benson, G ;
Farach, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (02) :299-307
[2]
Amir A., 1992, P 2 IEEE DAT COMPR C, P279
[3]
Andersson A., 1996, LNCS, V1097, P185
[4]
[Anonymous], 1998, SORTING SEARCHING
[5]
BaezaYates R, 2000, J AM SOC INFORM SCI, V51, P69, DOI 10.1002/(SICI)1097-4571(2000)51:1<69::AID-ASI10>3.0.CO
[6]
2-C
[7]
Bentley J, 1989, PROGRAMMING PEARLS
[8]
A LOCALLY ADAPTIVE DATA-COMPRESSION SCHEME [J].
BENTLEY, JL ;
SLEATOR, DD ;
TARJAN, RE ;
WEI, VK .
COMMUNICATIONS OF THE ACM, 1986, 29 (04) :320-330
[9]
Chen S., 1993, Proceedings. 34th Annual Symposium on Foundations of Computer Science (Cat. No.93CH3368-8), P104, DOI 10.1109/SFCS.1993.366877
[10]
String matching in Lempel-Ziv compressed strings [J].
Farach, M ;
Thorup, M .
ALGORITHMICA, 1998, 20 (04) :388-404