Estimation of entropy and mutual information

被引：924

作者：

Paninski, L ^{[1
]}

机构：

[1] NYU, Ctr Neural Sci, New York, NY 10003 USA

来源：

NEURAL COMPUTATION | 2003年 / 15卷 / 06期

关键词：

D O I：

10.1162/089976603321780272

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present some new results on the nonparametric estimation of entropy and mutual information. First, we use an exact local expansion of the entropy function to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators. The setup is related to Grenander's method of sieves and places no assumptions on the underlying probability measure generating the data. Second, we prove a converse to these consistency theorems, demonstrating that a misapplication of the most common estimation techniques leads to an arbitrarily poor estimate of the true information, even given unlimited data. This "inconsistency" theorem leads to an analytical approximation of the bias, valid in surprisingly small sample regimes and more accurate than the usual 1/N formula of Miller and Madow over a large region of parameter space. The two most practical implications of these results are negative: (1) information estimates in a certain data regime are likely contaminated by bias, even if "bias-corrected" estimators are used, and (2) confidence intervals calculated by standard techniques drastically underestimate the error of the most common estimation methods. Finally, we note a very useful connection between the bias of entropy estimators and a certain polynomial approximation problem. By casting bias calculation problems in this approximation theory framework, we obtain the best possible generalization of known asymptotic bias results. More interesting, this framework leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small. We demonstrate the application of this new estimator on both real and simulated data.

引用

页码：1191 / 1253

页数：63

共 53 条

[1] [Anonymous], 1967, TOHOKU MATH J 2 SERI
[2] [Anonymous], 1995, Theory of Statistics
[3] [Anonymous], 1955, INFORM THEORY PSYCHO
[4] Convergence properties of functional estimates for discrete distributions
Antos, A
Kontoyiannis, I
[J]. RANDOM STRUCTURES & ALGORITHMS, 2001, 19 (3-4) : 163 - 193
[5] Basharin G. P., 1959, THEOR PROBAB APPL, V4, P333
[6] Beirlant J, 1997, INT J MATH STAT SCI, V6, P17
[7] READING A NEURAL CODE
BIALEK, W
RIEKE, F
VANSTEVENINCK, RRD
WARLAND, D
[J]. SCIENCE, 1991, 252 (5014) : 1854 - 1857
[8] Billingsley P., 1965, ERGODIC THEORY INFOR
[9] Efficient discrimination of temporal patterns by motion-sensitive neurons in primate visual cortex
Buracas, GT
Zador, AM
DeWeese, MR
Albright, TD
[J]. NEURON, 1998, 20 (05) : 959 - 969
[10] ON BIAS OF INFORMATION ESTIMATES
CARLTON, AG
[J]. PSYCHOLOGICAL BULLETIN, 1969, 71 (02) : 108 - &

← 1 2 3 4 5 6 →