A universal predictor based on pattern matching

被引：43

作者：

Jacquet, P

Szpankowski, W

Apostol, I

机构：

[1] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France

[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA

[3] Amgen Inc, Thousand Oaks, CA 91320 USA

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2002年 / 48卷 / 06期

基金：

美国国家科学基金会;

关键词：

context selection; optimal predictor; pattern matching; redundancy of universal predictors; sequential decision; suffix trees; universal predictor; universal source coding;

D O I：

10.1109/TIT.2002.1003834

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider a universal predictor based on pattern matching. Given a sequence X-1,...,X-n drawn from a stationary mixing source, it predicts the next symbol Xn+1 based on selecting a context of Xn+1. The predictor, called the Sampled Pattern Matching (SPM), is a modification of the Ehrenfeucht-Mycielski pseudorandom generator algorithm. It predicts the value of the most frequent symbol appearing at the so-called sampled positions. These positions follow the occurrences of a fraction of the longest suffix of the original sequence that has another copy inside X1X2...X-n; that is, in SPM, the context selection consists of taking certain fraction of the longest match. The study of the longest match for lossless data compression was initiated by Wyner and Ziv in their 1989 seminal paper. Here, we estimate the redundancy of the SPM universal predictor, that is, we prove that the probability the SPM predictor makes worse decisions than the optimal predictor is O (n(-nu)) for some 0 < v < 1/2 as n --> infinity. As a matter of fact, we show that we can predict K = O(1) symbols with the same probability of error.

引用

页码：1462 / 1472

页数：11

共 30 条

[11]

Jacquet P, 2000, TRENDS MATH, P75

[12]

JACQUET P, 1999, AVERAGE CASE ANAL PA

[13]

KIEFFER J, 1998, PREDICTION INFORMATI

[14] A suboptimal lossy data compression based on approximate pattern matching [J].