On the momentum term in gradient descent learning algorithms

被引:1571
作者
Qian, N [1 ]
机构
[1] Columbia Univ, Ctr Neurobiol & Behav, New York, NY 10032 USA
基金
美国国家卫生研究院;
关键词
momentum; gradient descent learning algorithm; damped harmonic oscillator; critical damping; learning rate; speed of convergence;
D O I
10.1016/S0893-6080(98)00116-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A momentum term is usually included in the simulations of connectionist learning algorithms. Although it is well known that such a term greatly improves the speed of learning, there have been few rigorous studies of its: mechanisms. In this paper, I show that in the limit of continuous time, the momentum parameter is analogous to the mass of Newtonian particles that move through a viscous medium in a conservative force field. The behavior of the system near a local minimum is equivalent to a set of coupled and damped harmonic oscillators. The momentum term improves the speed of convergence by bringing some eigen components of the system closer to critical damping. Similar results can be obtained for the discrete time case used in computer simulations. In particular, I derive the bounds for convergence on learning-rate and momentum parameters, and demonstrate that the momentum term can increase the range of learning rate over which the system converges. The optimal condition for convergence is also analyzed. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:145 / 151
页数:7
相关论文
共 11 条
[1]  
ANDERSON JA, 1990, NEUROCOMPUTING, V2
[2]  
[Anonymous], P 1998 CONN MOD SUMM
[3]  
Churchland Patricia S., 1992, The Computational Brain, DOI DOI 10.7551/MITPRESS/2010.001.0001
[4]   INCREASED RATES OF CONVERGENCE THROUGH LEARNING RATE ADAPTATION [J].
JACOBS, RA .
NEURAL NETWORKS, 1988, 1 (04) :295-307
[5]  
Kleppner Daniel, 1973, An Introduction to Mechanics, V1st
[6]  
LeCun Y., 1989, Advances in neural information processing systems, V2, P598
[7]   PREDICTING THE SECONDARY STRUCTURE OF GLOBULAR-PROTEINS USING NEURAL NETWORK MODELS [J].
Qian, N ;
SEJNOWSKI, TJ .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 202 (04) :865-884
[8]  
Rumelhart D.E., 1987, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, P318
[9]  
Rumelhart D. E., 1986, PARALLEL DISTRIBUTED, V1
[10]  
Rumelhart DE., 1986, PARALLEL DISTRIBUTED