Universal lossless source coding with the Burrows Wheeler Transform

被引:60
作者
Effros, M [1 ]
Visweswariah, K
Kulkarni, SR
Verdú, S
机构
[1] CALTECH, Dept Elect Engn MC 136 93, Pasadena, CA 91125 USA
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
Burrows Wheeler Transform (BWT); rate of convergence; redundancy; text compression; universal noiseless source coding;
D O I
10.1109/18.995542
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and 1278) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length n --> infinity, 2) a variety of very simple new techniques for BWT-based lossless source coding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources.
引用
收藏
页码:1061 / 1081
页数:21
相关论文
共 49 条
[1]  
Arimura M, 1998, IEICE T FUND ELECTR, VE81A, P2117
[2]  
ARIMURA M, 1998, P INT S INF THEOR IT, P286
[3]  
ARIMURA M, 1999, THESIS U TOK TOK JAP
[4]   Lexical permutation sorting algorithm [J].
Arnavut, Z ;
Magliveras, AS .
COMPUTER JOURNAL, 1997, 40 (05) :292-295
[5]   Block sorting transformations [J].
Arnavut, Z ;
Leavitt, D ;
Abdulazizoglu, M .
DCC '98 - DATA COMPRESSION CONFERENCE, 1998, :524-524
[6]   A LOCALLY ADAPTIVE DATA-COMPRESSION SCHEME [J].
BENTLEY, JL ;
SLEATOR, DD ;
TARJAN, RE ;
WEI, VK .
COMMUNICATIONS OF THE ACM, 1986, 29 (04) :320-330
[7]  
BENTLEY JL, 1984, P 22 ALL C COMM CONT, P233
[8]  
BURROWS M, 1994, 124 SRC
[9]   Higher compression from the Burrows-Wheeler transform by modified sorting [J].
Chapin, B ;
Tate, SR .
DCC '98 - DATA COMPRESSION CONFERENCE, 1998, :532-532
[10]   A vector quantization approach to universal noiseless coding and quantization [J].
Chon, PA ;
Effros, M ;
Gray, RM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1996, 42 (04) :1109-1138