A universal predictor based on pattern matching

被引:43
作者
Jacquet, P
Szpankowski, W
Apostol, I
机构
[1] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Amgen Inc, Thousand Oaks, CA 91320 USA
基金
美国国家科学基金会;
关键词
context selection; optimal predictor; pattern matching; redundancy of universal predictors; sequential decision; suffix trees; universal predictor; universal source coding;
D O I
10.1109/TIT.2002.1003834
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2...X-n; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O (n(-nu)) for some 0 < v < 1/2 as n --> infinity. As a matter of fact, we show that we can predict K = O(1) symbols with the same probability of error.
引用
收藏
页码:1462 / 1472
页数:11
相关论文
共 30 条
[21]  
Ryabko B. Y., 1988, PROBLEMY PEREDACHI I, V24, P3
[22]  
RYABKO BY, 1984, PROBL INFORM TRA JUL, P173
[23]   PREDICTION AND ENTROPY OF PRINTED ENGLISH [J].
SHANNON, CE .
BELL SYSTEM TECHNICAL JOURNAL, 1951, 30 (01) :50-64
[24]  
SHIELD SP, 1996, EROGODIC THEORY DISC
[25]   A GENERALIZED SUFFIX TREE AND ITS (UN)EXPECTED ASYMPTOTIC BEHAVIORS [J].
SZPANKOWSKI, W .
SIAM JOURNAL ON COMPUTING, 1993, 22 (06) :1176-1198
[26]  
Szpankowski W., 2001, AVERAGE CASE ANAL AL
[27]   Optimal prefetching via data compression [J].
Vitter, JS ;
Krishnan, P .
JOURNAL OF THE ACM, 1996, 43 (05) :771-793
[28]   A UNIVERSAL FINITE MEMORY SOURCE [J].
WEINBERGER, MJ ;
RISSANEN, JJ ;
FEDER, M .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1995, 41 (03) :643-652
[29]   SOME ASYMPTOTIC PROPERTIES OF THE ENTROPY OF A STATIONARY ERGODIC DATA SOURCE WITH APPLICATIONS TO DATA-COMPRESSION [J].
WYNER, AD ;
ZIV, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1989, 35 (06) :1250-1258
[30]   On the redundancy of the fixed-database Lempel-Ziv algorithm for phi-mixing sources [J].
Yang, EH ;
Kieffer, JC .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1997, 43 (04) :1101-1111