Nonparametric entropy estimation for stationary processes and random fields, with applications to English text

被引:162
作者
Kontoyiannis, I [1 ]
Algoet, PH
Suhov, YM
Wyner, AJ
机构
[1] Stanford Univ, Dept Elect Engn, Informat Syst Lab Durand 141A, Stanford, CA 94035 USA
[2] Univ Cambridge, Stat Lab, DPMMS, Cambridge CB2 1SB, England
[3] Russian Acad Sci, Inst Problems Informat Transmiss, Moscow, Russia
[4] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
entropy rate; entropy of English; pattern matching; universal data compression;
D O I
10.1109/18.669425
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Cesaro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker. We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.
引用
收藏
页码:1319 / 1327
页数:9
相关论文
共 33 条
[1]   THE STRONG LAW OF LARGE NUMBERS FOR SEQUENTIAL DECISIONS UNDER UNCERTAINTY [J].
ALGOET, PH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1994, 40 (03) :609-633
[2]   THE STRONG ERGODIC THEOREM FOR DENSITIES - GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM [J].
BARRON, AR .
ANNALS OF PROBABILITY, 1985, 13 (04) :1292-1303
[3]   THE INDIVIDUAL ERGODIC THEOREM OF INFORMATION-THEORY [J].
BREIMAN, L .
ANNALS OF MATHEMATICAL STATISTICS, 1957, 28 (03) :809-811
[4]  
BREIMAN L, 1960, ANN MATH STAT, V31, P809, DOI 10.1214/aoms/1177705812
[5]  
Chen S., 1993, Proceedings. 34th Annual Symposium on Foundations of Computer Science (Cat. No.93CH3368-8), P104, DOI 10.1109/SFCS.1993.366877
[6]  
CHEN S, 1995, P DCC95 DAT COMPR C, P282
[7]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[8]  
Doeblin W., 1937, Bulletin mathematique de la Societe Roumaine des Sciences, V39, P3
[9]  
Doeblin W., 1937, Bull. Math. Soc. Roum. Sci., V39, P57
[10]  
FARACH M, 1995, PROCEEDINGS OF THE SIXTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P48