Long short-term memory

被引:11346
作者
Hochreiter, S [1 ]
Schmidhuber, J [1 ]
机构
[1] IDSIA, CH-6900 LUGANO, SWITZERLAND
关键词
D O I
10.1162/neco.1997.9.8.1735
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error now through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
引用
收藏
页码:1735 / 1780
页数:46
相关论文
共 42 条
  • [1] ALMEIDA LB, 1987, 1ST P IEEE INT C NEU, V2, P609
  • [2] [Anonymous], 1993, P ADV NEUR INF PROC
  • [3] [Anonymous], 1989, NUCCS8927
  • [4] [Anonymous], 1993, Advances in Neural Information Processing Systems
  • [5] [Anonymous], ADVANCES IN NEURAL I
  • [6] [Anonymous], 1993, Advances in Neural Information Processing Systems
  • [7] [Anonymous], 1991, Advances in Neural Information Processing Systems
  • [8] Contrastive Learning and Neural Oscillations
    Baldi, Pierre
    Pineda, Fernando
    [J]. NEURAL COMPUTATION, 1991, 3 (04) : 526 - 545
  • [9] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [10] BENGIO Y, 1994, ADV NEURAL INFORMATI, V6, P75