Efficient backprop

被引:2460
作者
LeCun, Y
Bottou, L
Orr, GB
Müller, KR
机构
[1] AT&T Bell Labs, Res, Image Proc Res Dept, Red Bank, NJ 07701 USA
[2] Willamette Univ, Salem, OR 97301 USA
[3] GMD FIRST, D-12489 Berlin, Germany
来源
NEURAL NETWORKS: TRICKS OF THE TRADE | 1998年 / 1524卷
关键词
D O I
10.1007/3-540-49430-8_2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The convergence of back-propagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that second-order optimization methods are advantageous for neural net training. It is shown that most "classical" second-order methods are impractical for large neural networks. A few methods are proposed that do not have these limitations.
引用
收藏
页码:9 / 50
页数:42
相关论文
共 44 条
[1]   Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[2]  
Amari S, 1997, ADV NEUR IN, V9, P127
[3]   1ST-ORDER AND 2ND-ORDER METHODS FOR LEARNING - BETWEEN STEEPEST DESCENT AND NEWTON METHOD [J].
BATTITI, R .
NEURAL COMPUTATION, 1992, 4 (02) :141-166
[4]  
Becker S, 1989, P 1988 CONN MOD SUMM, P29
[5]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[6]  
Broomhead D. S., 1988, Complex Systems, V2, P321
[7]  
BUNTINE WL, 1993, IN PRESS IEEE T NEUR
[8]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[9]  
DARKEN C, 1991, ADV NEURAL INFORMATI, V3, P832
[10]  
Diamantaras KI, 1996, Principal Component Neural Networks: Theory and Applications