Markovian architectural bias of recurrent neural networks

被引:143
作者
Tino, P [1 ]
Cernansky, M
Benusková, L
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
[2] Slovak Univ Technol Bratislava, Fac Elect Engn & Informat Technol, Bratislava 81219, Slovakia
[3] Comenius Univ, Fac Math Phys & Informat, Bratislava 84248 4, Slovakia
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2004年 / 15卷 / 01期
关键词
complex symbolic sequences; information latching problem; iterative function systems; Markov models; recurrent neural networks (RNNs);
D O I
10.1109/TNN.2003.820839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure.
引用
收藏
页码:6 / 15
页数:10
相关论文
共 49 条
  • [1] Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
  • [2] Barnsley M., 1988, FRACTALS EVERYWHERE
  • [3] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT
    BENGIO, Y
    SIMARD, P
    FRASCONI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02): : 157 - 166
  • [4] BENGIO Y, 1993, P 1993 IEEE INT C NE, V3, P1283
  • [5] Analysis of dynamical recognizers
    Blair, AD
    Pollack, JB
    [J]. NEURAL COMPUTATION, 1997, 9 (05) : 1127 - 1142
  • [6] Bühlmann P, 1999, ANN STAT, V27, P480
  • [7] The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction
    Casey, M
    [J]. NEURAL COMPUTATION, 1996, 8 (06) : 1135 - 1178
  • [8] Christiansen MH, 1999, COGNITIVE SCI, V23, P417, DOI 10.1207/s15516709cog2304_2
  • [9] Finite State Automata and Simple Recurrent Networks
    Cleeremans, Axel
    Servan-Schreiber, David
    McClelland, James L.
    [J]. NEURAL COMPUTATION, 1989, 1 (03) : 372 - 381
  • [10] DOYA K, 1992, 1992 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-6, P2777, DOI 10.1109/ISCAS.1992.230622