Compressing relations and indexes

被引：91

作者：

Goldstein, J ^{[1
]}

Ramakrishnan, R ^{[1
]}

Shaft, U ^{[1
]}

机构：

[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA

来源：

14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 1998年

关键词：

D O I：

10.1109/ICDE.1998.655800

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a new compression algorithm that is tailored to database applications. rt can he applied to a collection of records, and is especially effective for records with many low to medium cardinality fields and numeric fields. In addition, this new technique supports very fast decompression. Promising application domains include decision support systems (DSS), since "fact tables", which are by far the largest tables in these applications, contain many low and medium cardinality fields and typically no test fields. Further, our decompression rates are faster than typical disk throughputs for sequential scans; in contrast, gzip is slower. This is important in DSS applications, which often scan large ranges of records. An important distinguishing characteristic of our algorithm, in contrast to compression algorithms proposed earlier, is that we can decompress individual tuples (even individual fields), rather than a full page (or an entire relation) at a time. Also, all the information needed Jar tuple decompression re-sides on the same page with the tuple. This means that a page can be stored in the buffer pool and used in compressed form, simplifying the job Of the buffer manager and improving memory utilization. Our compression algorithm, also improves index structures such as B-frees and R-trees significantly by reducing the number of leaf pages and compressing index entries, which greatly increases the fan-out. We can also use lossy compression on the internal nodes of an index.

引用

页码：370 / 379

页数：10