Architectural bias in recurrent neural networks: Fractal analysis

被引:16
作者
Tino, P [1 ]
Hammer, B
机构
[1] Aston Univ, Birmingham B4 7ET, W Midlands, England
[2] Univ Osnabruck, D-49069 Osnabruck, Germany
关键词
D O I
10.1162/08997660360675099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002, Tino, Cernansky, & Benuskova, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram-a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
引用
收藏
页码:1931 / 1957
页数:27
相关论文
共 41 条
[1]  
[Anonymous], HDB THEORETICAL COMP
[2]  
Barnsley M. F., 2014, Fractals Everywhere
[3]   RECURRENT ITERATED FUNCTION SYSTEMS [J].
BARNSLEY, MF ;
ELTON, JH ;
HARDIN, DP .
CONSTRUCTIVE APPROXIMATION, 1989, 5 (01) :3-31
[4]   Analysis of dynamical recognizers [J].
Blair, AD ;
Pollack, JB .
NEURAL COMPUTATION, 1997, 9 (05) :1127-1142
[5]   On learning context-free and context-sensitive languages [J].
Bodén, M ;
Wiles, J .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (02) :491-493
[6]  
BODEN M, IN PRESS APPL INTELL
[7]   The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction [J].
Casey, M .
NEURAL COMPUTATION, 1996, 8 (06) :1135-1178
[8]  
Christiansen MH, 1999, COGNITIVE SCI, V23, P417, DOI 10.1207/s15516709cog2304_2
[9]   Finite State Automata and Simple Recurrent Networks [J].
Cleeremans, Axel ;
Servan-Schreiber, David ;
McClelland, James L. .
NEURAL COMPUTATION, 1989, 1 (03) :372-381
[10]   AFFINE AUTOMATA AND RELATED TECHNIQUES FOR GENERATION OF COMPLEX IMAGES [J].
CULIK, K ;
DUBE, S .
THEORETICAL COMPUTER SCIENCE, 1993, 116 (02) :373-398