New results on recurrent network training: Unifying the algorithms and accelerating convergence

被引:335
作者
Atiya, AF [1 ]
Parlos, AG
机构
[1] CALTECH, Dept Elect Engn, Learning Syst Grp, Pasadena, CA 91125 USA
[2] Texas A&M Univ, Dept Mech Engn, College Stn, TX 77843 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期
基金
美国国家科学基金会;
关键词
backpropagation through time; constrained optimization; gradient approximation; optimal control; real time recurrent learning; recurrent networks;
D O I
10.1109/72.846741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we present a derivation that unifies these approaches. We demonstrate that the approaches are only five different ways of solving a particular matrix equation. The second goal of this paper is develop a new algorithm based on the insights gained from the novel formulation. The new algorithm, which is based on approximating the error gradient, has lower computational complexity in computing the weight update than the competing techniques for most typical problems. In addition, it reaches the error minimum in a much smaller number of iterations. A desirable characteristic of recurrent network training algorithms is to be able to update the weights in an on-line fashion. We have also developed an on-line version of the proposed algorithm, that is based on updating the error gradient approximation in a recursive manner.
引用
收藏
页码:697 / 709
页数:13
相关论文
共 53 条
[1]  
ALMEIDA LB, 1987, 1ST P IEEE INT C NEU, V2, P609
[2]   IDENTIFICATION OF NONLINEAR DYNAMICS USING A GENERAL SPATIOTEMPORAL NETWORK [J].
ATIYA, A ;
PARLOS, AG .
MATHEMATICAL AND COMPUTER MODELLING, 1995, 21 (1-2) :53-71
[3]  
ATIYA A, 1993, P WORLD C NEUR NETW
[4]  
ATIYA A, 1992, P INT JOINT NEUR NET
[5]   A comparison between neural-network forecasting techniques - Case study: River flow forecasting [J].
Atiya, AF ;
El-Shoura, SM ;
Shaheen, SI ;
El-Sherif, MS .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (02) :402-409
[6]  
ATIYA AF, 1988, NEURAL INFORM PROCES
[7]   GRADIENT DESCENT LEARNING ALGORITHM OVERVIEW - A GENERAL DYNAMICAL-SYSTEMS PERSPECTIVE [J].
BALDI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01) :182-195
[8]   HOW DELAYS AFFECT NEURAL DYNAMICS AND LEARNING [J].
BALDI, P ;
ATIYA, AF .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :612-621
[9]  
BASSI D, 1995, P NEUR NETW CAP MARK, P331
[10]   RELATING REAL-TIME BACKPROPAGATION AND BACKPROPAGATION-THROUGH-TIME - AN APPLICATION OF FLOW GRAPH INTERRECIPROCITY [J].
BEAUFAYS, F ;
WAN, EA .
NEURAL COMPUTATION, 1994, 6 (02) :296-306