A universal predictor based on pattern matching

被引:43
作者
Jacquet, P
Szpankowski, W
Apostol, I
机构
[1] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Amgen Inc, Thousand Oaks, CA 91320 USA
基金
美国国家科学基金会;
关键词
context selection; optimal predictor; pattern matching; redundancy of universal predictors; sequential decision; suffix trees; universal predictor; universal source coding;
D O I
10.1109/TIT.2002.1003834
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2...X-n; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O (n(-nu)) for some 0 < v < 1/2 as n --> infinity. As a matter of fact, we show that we can predict K = O(1) symbols with the same probability of error.
引用
收藏
页码:1462 / 1472
页数:11
相关论文
共 30 条
[1]   UNIVERSAL SCHEMES FOR PREDICTION, GAMBLING AND PORTFOLIO SELECTION [J].
ALGOET, P .
ANNALS OF PROBABILITY, 1992, 20 (02) :901-941
[2]   THE STRONG LAW OF LARGE NUMBERS FOR SEQUENTIAL DECISIONS UNDER UNCERTAINTY [J].
ALGOET, PH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1994, 40 (03) :609-633
[3]  
[Anonymous], SURVEYS COMBINATORIC
[4]  
BILLINGSLEY P., 1999, Convergence of Probability Measures, V2nd, DOI 10.1002/9780470316962
[5]   DATA-COMPRESSION USING ADAPTIVE CODING AND PARTIAL STRING MATCHING [J].
CLEARY, JG ;
WITTEN, IH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1984, 32 (04) :396-402
[6]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[7]  
Cover T. M., 1965, PROC 4 PRAGUE C INFO, P263
[8]   A PSEUDORANDOM SEQUENCE - HOW RANDOM IS IT [J].
EHRENFEUCHT, A ;
MYCIELSKI, J .
AMERICAN MATHEMATICAL MONTHLY, 1992, 99 (04) :373-375
[9]   UNIVERSAL PREDICTION OF INDIVIDUAL SEQUENCES [J].
FEDER, M ;
MERHAV, N ;
GUTMAN, M .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (04) :1258-1270
[10]  
Gusfield D, 1997, ALGORITHMS STRINGS T