Dynamics of learning near singularities in layered networks

被引:59
作者
Wei, Haikun [1 ,2 ,3 ]
Zhang, Jun [1 ,4 ]
Cousseau, Florent [1 ,5 ]
Ozeki, Tomoko [1 ,6 ]
Amari, Shun-Ichi [1 ]
机构
[1] RIKEN Brain Sci Inst, Wako, Saitama 3510198, Japan
[2] SE Univ, Nanjing 210096, Peoples R China
[3] Kyushu Inst Technol, Kitakyushu, Fukuoka 8080196, Japan
[4] Univ Michigan, Ann Arbor, MI 48109 USA
[5] Univ Tokyo, Chiba 2778561, Japan
[6] Tokai Univ, Kanagawa 2591292, Japan
关键词
D O I
10.1162/neco.2007.12-06-414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explicitly analyze the trajectories of learning near singularities in hierarchical networks, such as multilayer perceptrons and radial basis function networks, which include permutation symmetry of hidden nodes, and show their general properties. Such symmetry induces singularities in their parameter space, where the Fisher information matrix degenerates and odd learning behaviors, especially the existence of plateaus in gradient descent learning, arise due to the geometric structure of singularity. We plot dynamic vector fields to demonstrate the universal trajectories of learning near singularities. The singularity induces two types of plateaus, the on-singularity plateau and the near-singularity plateau, depending on the stability of the singularity and the initial parameters of learning. The results presented in this letter are universally applicable to a wide class of hierarchical models. Detailed stability analysis of the dynamics of learning in radial basis function networks and multilayer perceptrons will be presented in separate work.
引用
收藏
页码:813 / 843
页数:31
相关论文
共 41 条
  • [1] Difficulty of singularity in population coding
    Amari, S
    Nakahara, H
    [J]. NEURAL COMPUTATION, 2005, 17 (04) : 839 - 858
  • [2] Natural gradient works efficiently in learning
    Amari, S
    [J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [3] Amari S, 2002, ADV NEUR IN, V14, P343
  • [4] Amari S, 2001, IEICE T FUND ELECTR, VE84A, P31
  • [5] A THEORY OF ADAPTIVE PATTERN CLASSIFIERS
    AMARI, S
    [J]. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1967, EC16 (03): : 299 - +
  • [6] Singularities affect dynamics of learning in neuromanifolds
    Amari, Shun-ichi
    Park, Hyeyoung
    Ozeki, Tomoko
    [J]. NEURAL COMPUTATION, 2006, 18 (05) : 1007 - 1065
  • [7] NEURAL THEORY OF ASSOCIATION AND CONCEPT-FORMATION
    AMARI, SI
    [J]. BIOLOGICAL CYBERNETICS, 1977, 26 (03) : 175 - 185
  • [8] Amari SI, 2007, Methods of information geometry, V191
  • [9] Transient dynamics of on-line learning in two-layered neural networks
    Biehl, M
    Riegler, P
    Wohler, C
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1996, 29 (16): : 4769 - 4780
  • [10] LEARNING BY ONLINE GRADIENT DESCENT
    BIEHL, M
    SCHWARZE, H
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1995, 28 (03): : 643 - 656