Source coding, large deviations, and approximate pattern matching

被引:62
作者
Dembo, A [1 ]
Kontoyiannis, I
机构
[1] Stanford Univ, Dept Math, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[4] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
data compression; large deviations; pattern-matching; rate-distortion theory;
D O I
10.1109/TIT.2002.1003841
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
In this review paper, we present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case, we review how the AEP underlies the analysis of the Lempel-Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel-Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron and Orey's generalized AEP is also discussed. The lossy AEP is applied to I) prove strengthened versions of Shannon's direct source-coding theorem and universal coding theorems; ii) characterize the performance of "mismatched" codebooks in lossy data compression; iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel-Ziv schemes); and iv) determine the first-order asymptotic of waiting times between stationary processes. A refinement to the lossy AEP is then presented, and it is used to i) prove second-order (direct and converse) lossy source-coding theorems, including universal coding theorems; ii) characterize which sources are quantitatively easier to compress; iii) determine the second-order asymptotic of waiting times between stationary processes; and iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Finally, we discuss extensions of the above framework and results to random fields.
引用
收藏
页码:1590 / 1615
页数:26
相关论文
共 99 条
[1]
8Doukhan P., 2012, Mixing: Properties and Examples, V85
[2]
2D-pattern matching image and video compression: Theory, algorithms, and experiments [J].
Alzina, M ;
Szpankowski, W ;
Grama, A .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2002, 11 (03) :318-331
[3]
[Anonymous], 1971, RATE DISTORTION THEO
[4]
[Anonymous], KODAI MATH SEM REP
[5]
[Anonymous], 1995, RANDOM FIELDS NETWOR
[6]
[Anonymous], 1971, KODAI MATH SEM REP
[7]
THE ERDOS-RENYI LAW IN DISTRIBUTION, FOR COIN TOSSING AND SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, MS .
ANNALS OF STATISTICS, 1990, 18 (02) :539-570
[8]
THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1989, 17 (03) :1152-1169
[9]
`A PHASE TRANSITION FOR THE SCORE IN MATCHING RANDOM SEQUENCES ALLOWING DELETIONS [J].
Arratia, Richard ;
Waterman, Michael S. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (01) :200-225
[10]
Pattern matching image compression:: Algorithmic and empirical results [J].
Atallah, M ;
Génin, Y ;
Szpankowski, W .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (07) :614-627