COMPRESSION OF CORRELATED BIT-VECTORS

被引:23
作者
BOOKSTEIN, A
KLEIN, ST
机构
[1] Center for Information and Language Studies, University of Chicago, Chicago, IL 60637
关键词
DATA COMPRESSION; BITMAP COMPRESSION; DATA STORAGE; MAXIMUM SPANNING TREE; APPLICATION; CLUSTERING;
D O I
10.1016/0306-4379(91)90030-D
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bitmaps are data structures occurring often in information retrieval. They are useful, but are also large and expensive to store. For this reason, considerable effort has been devoted to finding techniques for compressing them. These techniques are most effective for sparse bitmaps. We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed. The clustering method efficiently generates a graph structure on the bitmaps. In some situations, it is desired to impose restrictions on the graph; finding the optimal graph satisfying these restrictions is shown to be NP-complete. The results of applying our algorithm to the Bible is presented: for some sets of bitmaps, our method almost doubled in the compression savings.
引用
收藏
页码:387 / 400
页数:14
相关论文
共 20 条
[11]  
Lelewer D. A., 1987, ACM COMPUT SURV, V19, P261
[12]   COMPRESSION OF LARGE INVERTED FILES WITH HYPERBOLIC TERM DISTRIBUTION [J].
SCHUEGRAF, EJ .
INFORMATION PROCESSING & MANAGEMENT, 1976, 12 (06) :377-384
[13]   MATHEMATICAL-ANALYSIS OF VARIOUS SUPERIMPOSED CODING METHODS [J].
STIASSNY, S .
AMERICAN DOCUMENTATION, 1960, 11 (02) :155-169
[14]  
Storer J.A., 1988, DATA COMPRESSION MET
[15]   COMPRESSION METHOD FOR CLUSTERED BIT-VECTORS [J].
TEUHOLA, J .
INFORMATION PROCESSING LETTERS, 1978, 7 (06) :308-311
[16]  
VALLARINO O, 1976, SIGPLAN NOTICES, V2, P108
[17]  
WEDEKING H, 1976, DATENBANKSYSTEME, V2
[18]  
Yao A. C., 1975, Information Processing Letters, V4, P21, DOI 10.1016/0020-0190(75)90056-3
[19]  
[No title captured]
[20]  
[No title captured]