Variational Bayesian Learning of Directed Graphical Models with Hidden Variables

被引：57

作者：

Beal, Matthew J. ^{[1
]}

Ghahramani, Zoubin ^{[2
]}

机构：

[1] SUNY Buffalo, Buffalo, NY 14260 USA

[2] UCL, Gatsby Computat Neurosci Unit, London, England

来源：

BAYESIAN ANALYSIS | 2006年 / 1卷 / 04期

关键词：

Approximate Bayesian Inference; Bayes Factors; Directed Acyclic Graphs; EM Algorithm; Graphical Models; Markov Chain Monte Carlo; Model Selection; Variational Bayes;

D O I：

10.1214/06-BA126

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical models, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightforward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs containing hidden variables, we compare the accuracy of the VB approximation to existing asymptotic-data approximations such as the Bayesian Information Criterion (BIC) and the Cheeseman-Stutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We find that the VB algorithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.

引用

页码：793 / 831

页数：39

共 37 条

[1]

[Anonymous], 1994, ADV NEURAL INFORM PR

[2]

[Anonymous], 1996, ADV KNOWLEDGE DISCOV

[3]

[Anonymous], P 9 INT C ART NEUR N

[4] Independent factor analysis [J].

Attias, H .

NEURAL COMPUTATION, 1999, 11 (04) :803-851

[5]

ATTIAS H, 1999, P 15 C UNC ART INT

[6]

Attias H., 2000, ADV NEURAL INFORM PR, V12

[7]

BEAL M, 2003, THESIS GATSBY COMPUT

[8] Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables [J].

Chickering, DM ;

Heckerman, D .

MACHINE LEARNING, 1997, 29 (2-3) :181-212

[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

Friedman N., 1998, P 14 C UNC ART INT U

← 1 2 3 4 →