A fast learning algorithm for deep belief nets

被引：12720

作者：

Hinton, Geoffrey E. ^{[1
]}

Osindero, Simon

Teh, Yee-Whye

机构：

[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada

[2] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore

来源：

NEURAL COMPUTATION | 2006年 / 18卷 / 07期

关键词：

D O I：

10.1162/neco.2006.18.7.1527

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

引用

页码：1527 / 1554

页数：28

共 21 条

[1] [Anonymous], [No title captured]
[2] Shape matching and object recognition using shape contexts
Belongie, S
Malik, J
Puzicha, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 509 - 522
[3] Carreira-Perpinan M. A., 2005, ARTIF INTELL, P33
[4] Training invariant support vector machines
Decoste, D
Schölkopf, B
[J]. MACHINE LEARNING, 2002, 46 (1-3) : 161 - 190
[5] BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY
FREUND, Y
[J]. INFORMATION AND COMPUTATION, 1995, 121 (02) : 256 - 285
[6] A NESTED PARTITIONING PROCEDURE FOR NUMERICAL MULTIPLE INTEGRATION
FRIEDMAN, JH
WRIGHT, MH
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1981, 7 (01): : 76 - 92
[7] Training products of experts by minimizing contrastive divergence
Hinton, GE
[J]. NEURAL COMPUTATION, 2002, 14 (08) : 1771 - 1800
[8] THE WAKE-SLEEP ALGORITHM FOR UNSUPERVISED NEURAL NETWORKS
HINTON, GE
DAYAN, P
FREY, BJ
NEAL, RM
[J]. SCIENCE, 1995, 268 (5214) : 1158 - 1161
[9] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[10] Hierarchical Bayesian inference in the visual cortex
Lee, TS
Mumford, D
[J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2003, 20 (07) : 1434 - 1448

← 1 2 3 →