The flip-the-state transition operator for restricted Boltzmann machines

被引:14
作者
Brugge, Kai [1 ,2 ]
Fischer, Asja [3 ]
Igel, Christian [4 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
[2] Helsinki Inst Informat Technol HIIT, Helsinki 00014, Finland
[3] Ruhr Univ Bochum, Inst Neuroinformat, D-44780 Bochum, Germany
[4] Univ Copenhagen, Dept Comp Sci, DK-2100 Copenhagen, Denmark
关键词
Restricted Boltzmann machine; Markov chain Monte Carlo; Gibbs sampling; Mixing rate; Contrastive divergence learning; Parallel tempering;
D O I
10.1007/s10994-013-5390-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most learning and sampling algorithms for restricted Boltzmann machines (RMBs) rely on Markov chain Monte Carlo (MCMC) methods using Gibbs sampling. The most prominent examples are Contrastive Divergence learning (CD) and its variants as well as Parallel Tempering (PT). The performance of these methods strongly depends on the mixing properties of the Gibbs chain. We propose a Metropolis-type MCMC algorithm relying on a transition operator maximizing the probability of state changes. It is shown that the operator induces an irreducible, aperiodic, and hence properly converging Markov chain, also for the typically used periodic update schemes. The transition operator can replace Gibbs sampling in RBM learning algorithms without producing computational overhead. It is shown empirically that this leads to faster mixing and in turn to more accurate learning.
引用
收藏
页码:53 / 69
页数:17
相关论文
共 21 条
[1]  
[Anonymous], P INT JOINT C NEUR N
[2]  
[Anonymous], 2010, P 13 INT C ARTIFICIA
[3]  
[Anonymous], 1993, Probabilistic inference using Markov chain Monte Carlo methods
[4]  
Bengio Y., 2013, PMLR
[5]   Justifying and Generalizing Contrastive Divergence [J].
Bengio, Yoshua ;
Delalleau, Olivier .
NEURAL COMPUTATION, 2009, 21 (06) :1601-1621
[6]  
Bremaud P., 1999, TEXTS APPL MATH
[7]   Quickly Generating Representative Samples from an RBM-Derived Process [J].
Breuleux, Olivier ;
Bengio, Yoshua ;
Vincent, Pascal .
NEURAL COMPUTATION, 2011, 23 (08) :2058-2073
[8]   Bounding the Bias of Contrastive Divergence Learning [J].
Fischer, Asja ;
Igel, Christian .
NEURAL COMPUTATION, 2011, 23 (03) :664-673
[9]  
Fischer A, 2010, LECT NOTES COMPUT SC, V6354, P208, DOI 10.1007/978-3-642-15825-4_26
[10]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507