Variational Inference: A Review for Statisticians

被引：2650

作者：

Blei, David M. ^{[1
]}

Kucukelbir, Alp ^{[2
]}

McAuliffe, Jon D. ^{[3
]}

机构：

[1] Columbia Univ, Dept Comp Sci & Stat, New York, NY USA

[2] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA

[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2017年 / 112卷 / 518期

基金：

美国国家科学基金会;

关键词：

Algorithms; Computationally intensive methods; Statistical computing; MAXIMUM-LIKELIHOOD; BAYESIAN MODEL; ASYMPTOTIC NORMALITY; SPATIOTEMPORAL MODEL; MIXTURE-MODELS; REGRESSION; APPROXIMATION; INFORMATION; SELECTION; SPARSE;

D O I：

10.1080/01621459.2017.1285773

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this article, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find a member of that family which is close to the target density. Closeness is measured by KullbackLeibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this article is to catalyze statistical research on this class of algorithms. Supplementary materials for this article are available online.

引用

页码：859 / 877

页数：19

共 169 条

[71] Foti NJ, 2014, ADV NEUR IN, V27
[72] SAMPLING-BASED APPROACHES TO CALCULATING MARGINAL DENSITIES
GELFAND, AE
SMITH, AFM
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (410) : 398 - 409
[73] STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES
GEMAN, S
GEMAN, D
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) : 721 - 741
[74] Decomposing spatiotemporal brain patterns into topographic latent sources
Gershman, Samuel J.
Blei, David M.
Norman, Kenneth A.
Sederberg, Per B.
[J]. NEUROIMAGE, 2014, 98 : 91 - 102
[75] Factorial hidden Markov models
Ghahramani, Z
Jordan, MI
[J]. MACHINE LEARNING, 1997, 29 (2-3) : 245 - 273
[76] ASYMPTOTIC NORMALITY AND VALID INFERENCE FOR GAUSSIAN VARIATIONAL APPROXIMATION
Hall, Peter
Tung Pham
Wand, M. P.
Wang, S. S. J.
[J]. ANNALS OF STATISTICS, 2011, 39 (05) : 2502 - 2532
[77] Hall P, 2011, STAT SINICA, V21, P369
[78] A Bayesian spatiotemporal model for very large data sets
Harrison, L. M.
Green, G. G. R.
[J]. NEUROIMAGE, 2010, 50 (03) : 1126 - 1141
[79] HASTINGS WK, 1970, BIOMETRIKA, V57, P97, DOI 10.1093/biomet/57.1.97
[80] Hensman J., 2013, UNCERTAINTY ARTIFICI, P282, DOI [DOI 10.1162/089976699300016331, DOI 10.48550/ARXIV.1309.6835]

← 3 4 5 6 7 8 9 10 11 12 →