Estimation of entropy and mutual information

被引:924
作者
Paninski, L [1 ]
机构
[1] NYU, Ctr Neural Sci, New York, NY 10003 USA
关键词
D O I
10.1162/089976603321780272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present some new results on the nonparametric estimation of entropy and mutual information. First, we use an exact local expansion of the entropy function to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators. The setup is related to Grenander's method of sieves and places no assumptions on the underlying probability measure generating the data. Second, we prove a converse to these consistency theorems, demonstrating that a misapplication of the most common estimation techniques leads to an arbitrarily poor estimate of the true information, even given unlimited data. This "inconsistency" theorem leads to an analytical approximation of the bias, valid in surprisingly small sample regimes and more accurate than the usual 1/N formula of Miller and Madow over a large region of parameter space. The two most practical implications of these results are negative: (1) information estimates in a certain data regime are likely contaminated by bias, even if "bias-corrected" estimators are used, and (2) confidence intervals calculated by standard techniques drastically underestimate the error of the most common estimation methods. Finally, we note a very useful connection between the bias of entropy estimators and a certain polynomial approximation problem. By casting bias calculation problems in this approximation theory framework, we obtain the best possible generalization of known asymptotic bias results. More interesting, this framework leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small. We demonstrate the application of this new estimator on both real and simulated data.
引用
收藏
页码:1191 / 1253
页数:63
相关论文
共 53 条
  • [1] [Anonymous], 1967, TOHOKU MATH J 2 SERI
  • [2] [Anonymous], 1995, Theory of Statistics
  • [3] [Anonymous], 1955, INFORM THEORY PSYCHO
  • [4] Convergence properties of functional estimates for discrete distributions
    Antos, A
    Kontoyiannis, I
    [J]. RANDOM STRUCTURES & ALGORITHMS, 2001, 19 (3-4) : 163 - 193
  • [5] Basharin G. P., 1959, THEOR PROBAB APPL, V4, P333
  • [6] Beirlant J, 1997, INT J MATH STAT SCI, V6, P17
  • [7] READING A NEURAL CODE
    BIALEK, W
    RIEKE, F
    VANSTEVENINCK, RRD
    WARLAND, D
    [J]. SCIENCE, 1991, 252 (5014) : 1854 - 1857
  • [8] Billingsley P., 1965, ERGODIC THEORY INFOR
  • [9] Efficient discrimination of temporal patterns by motion-sensitive neurons in primate visual cortex
    Buracas, GT
    Zador, AM
    DeWeese, MR
    Albright, TD
    [J]. NEURON, 1998, 20 (05) : 959 - 969
  • [10] ON BIAS OF INFORMATION ESTIMATES
    CARLTON, AG
    [J]. PSYCHOLOGICAL BULLETIN, 1969, 71 (02) : 108 - &