Adaptive natural gradient learning algorithms for various stochastic models

被引:106
作者
Park, H [1 ]
Amari, SI
Fukumizu, K
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea
[2] RIKEN, Brain Sci Inst, Wako, Saitama 35101, Japan
[3] Inst Stat Math, Tokyo 106, Japan
关键词
feedforward neural network; gradient descent learning; plateau problem; natural gradient learning; adaptive natural gradient learning;
D O I
10.1016/S0893-6080(00)00051-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The natural gradient method has an ideal dynamic behavior which resolves the slow learning speed of the standard gradient descent method caused by plateaus. However, it is required to calculate the Fisher information matrix and its inverse, which makes the implementation of the natural gradient almost impossible. To solve this problem, a preliminary study has been proposed concerning an adaptive method of calculating an estimate of the inverse of the Fisher information matrix, which is called the adaptive natural gradient learning method. In this paper, we show that the adaptive natural gradient method can be extended to be applicable to a wide class of stochastic models: regression with an arbitrary noise model and classification with an arbitrary number of classes. We give explicit forms of the adaptive natural gradient for these models. We confirm the practical advantage of the proposed algorithms through computational experiments on benchmark problems. (C) 2000 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:755 / 764
页数:10
相关论文
共 14 条
[1]   Adaptive method of realizing natural gradient learning for multilayer perceptrons [J].
Amari, S ;
Park, H ;
Fukumizu, K .
NEURAL COMPUTATION, 2000, 12 (06) :1399-1409
[2]   Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[3]   BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD [J].
AMARI, S .
NEUROCOMPUTING, 1993, 5 (4-5) :185-196
[4]  
AMARI S, 2000, INFORMATION GEOMETRY
[5]  
AMARI S, 1985, SPRINGER LECT NOTE S, V28
[6]   Dynamics of multilayer networks in the vicinity of temporary minima [J].
Ampazis, N ;
Perantonis, SJ ;
Taylor, JG .
NEURAL NETWORKS, 1999, 12 (01) :43-58
[7]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[8]  
JOOST M, 1996, INT J UNCERTAIN FUZZ, V6, P117
[9]  
LeCun Y., 1998, SPRINGER LECT NOTES, V1524
[10]   Natural gradient descent for on-line learning [J].
Rattray, M ;
Saad, D ;
Amari, S .
PHYSICAL REVIEW LETTERS, 1998, 81 (24) :5461-5464