Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables

被引:32
作者
David Maxwell Chickering
David Heckerman
机构
[1] Microsoft Research,
来源
Machine Learning | 1997年 / 29卷
关键词
Bayesian model averaging; model selection; multinomial mixtures; clustering; unsupervised learning; Laplace approximation;
D O I
暂无
中图分类号
学科分类号
摘要
We discuss Bayesian methods for model averaging and model selection among Bayesian-network models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) Minimum Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman–Stutz and Diagonal approximations are the most computationally efficient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman–Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori configuration are near a boundary.
引用
收藏
页码:181 / 212
页数:31
相关论文
共 40 条
  • [1] Buntine W.(1994)Computing second derivatives in feed-forward networks: A review IEEE Transactions on Neural Networks 5 480-488
  • [2] Buntine W.(1994)Operations for learning with graphical models Journal of Artificial Intelligence Research 2 159-225
  • [3] Buntine W.(1996)A guide to the literature on learning graphical models IEEE Transactions on Knowledge and Data Engineering 8 195-210
  • [4] Chib S.(1995)Marginal likelihood from the Gibbs output Journal of the American Statistical Association 90 1313-1321
  • [5] Cooper G.(1992)A Bayesian method for the induction of probabilistic networks from data Machine Learning 9 309-347
  • [6] Herskovits E.(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society, B 39 1-38
  • [7] Dempster A.(1995)Assessment and propagation of model uncertainty (with discussion) Journal of the Royal Statistical Society B 57 45-97
  • [8] Laird N.(1984)Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721-742
  • [9] Rubin D.(1988)On the choice of a model to fit data from an exponential family Annals of Statistics 16 342-355
  • [10] Draper D.(1995)Learning Bayesian networks: The combination of knowledge and statistical data Machine Learning 20 197-243