COMPRESSION OF CORRELATED BIT-VECTORS

被引:23
作者
BOOKSTEIN, A
KLEIN, ST
机构
[1] Center for Information and Language Studies, University of Chicago, Chicago, IL 60637
关键词
DATA COMPRESSION; BITMAP COMPRESSION; DATA STORAGE; MAXIMUM SPANNING TREE; APPLICATION; CLUSTERING;
D O I
10.1016/0306-4379(91)90030-D
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bitmaps are data structures occurring often in information retrieval. They are useful, but are also large and expensive to store. For this reason, considerable effort has been devoted to finding techniques for compressing them. These techniques are most effective for sparse bitmaps. We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed. The clustering method efficiently generates a graph structure on the bitmaps. In some situations, it is desired to impose restrictions on the graph; finding the optimal graph satisfying these restrictions is shown to be NP-complete. The results of applying our algorithm to the Bible is presented: for some sets of bitmaps, our method almost doubled in the compression savings.
引用
收藏
页码:387 / 400
页数:14
相关论文
共 20 条
[1]  
Abramowitz M., 1965, HDB MATH FUNCTIONS
[2]  
BELL T, 1989, COMPUT SURV, V21, P557, DOI 10.1145/76894.76896
[3]  
BOOKSTEIN A, 1990, IN PRESS INF PROCESS
[4]  
CHOUEKA Y, 1986, 9TH P ACM SIGIR C PI, P88
[5]   SIGNATURE FILES - AN ACCESS METHOD FOR DOCUMENTS AND ITS ANALYTICAL PERFORMANCE EVALUATION [J].
FALOUTSOS, C ;
CHRISTODOULAKIS, S .
ACM TRANSACTIONS ON OFFICE INFORMATION SYSTEMS, 1984, 2 (04) :267-288
[6]  
Garey M.R., 1979, COMPUTERS INTRACTABI, V174
[7]   HUFFMAN CODING IN BIT-VECTOR COMPRESSION [J].
JAKOBSSON, M .
INFORMATION PROCESSING LETTERS, 1978, 7 (06) :304-307
[8]  
Joseph J., 1956, P AM MATH SOC, V7, P48, DOI [DOI 10.1090/S0002-9939-1956-0078686-7, 10.2307/2033241]
[9]  
KLEIN ST, 1989, ACM T INFORM SYST, V7, P230, DOI 10.1145/75335.75351
[10]  
Knuth D. E., 2011, ART COMPUTER PROGRAM, V4