LEARNING BY ONLINE GRADIENT DESCENT

被引:126
作者
BIEHL, M [1 ]
SCHWARZE, H [1 ]
机构
[1] LUND UNIV,DEPT THEORET PHYS,S-22362 LUND,SWEDEN
来源
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL | 1995年 / 28卷 / 03期
关键词
D O I
10.1088/0305-4470/28/3/018
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We study on-line gradient-descent learning in multilayer networks analytically and numerically. The training is based on randomly drawn inputs and their corresponding outputs as defined by a target rule. In the thermodynamic limit we derive deterministic differential equations for the order parameters of the problem which allow an exact calculation of the evolution of the generalization error. First we consider a single-layer perceptron with sigmoidal activation function learning a target rule defined by a network of the same architecture. For this model the generalization error decays exponentially with the number of training examples if the learning rate is sufficiently small. However, if the learning rate is increased above a critical value, perfect learning is no longer possible. For architectures with hidden layers and fixed hidden-to-output weights, such as the parity and the committee machine, we find additional effects related to the existence of symmetries in these problems.
引用
收藏
页码:643 / 656
页数:14
相关论文
共 29 条
  • [1] [Anonymous], 1991, INTRO THEORY NEURAL, DOI DOI 10.1201/9780429499661
  • [2] AN EXACTLY SOLVABLE MODEL OF UNSUPERVISED LEARNING
    BIEHL, M
    [J]. EUROPHYSICS LETTERS, 1994, 25 (05): : 391 - 396
  • [3] GENERALIZATION ABILITY OF PERCEPTRONS WITH CONTINUOUS OUTPUTS
    BOS, S
    KINZEL, W
    OPPER, M
    [J]. PHYSICAL REVIEW E, 1993, 47 (02): : 1384 - 1391
  • [4] BRYSON A, 1969, APPLIED OPTIMAL CONT
  • [5] MEMORIZATION WITHOUT GENERALIZATION IN A MULTILAYERED NEURAL NETWORK
    HANSEL, D
    MATO, G
    MEUNIER, C
    [J]. EUROPHYSICS LETTERS, 1992, 20 (05): : 471 - 476
  • [6] STOCHASTIC DYNAMICS OF SUPERVISED LEARNING
    HANSEN, LK
    PATHRIA, R
    SALAMON, P
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1993, 26 (01): : 63 - 71
  • [7] HEBB DO, 1949, ORG BEHAVIOR
  • [8] HERTZ J, 1994, NONLINEAR BACK PROPA
  • [9] LEARNING-PROCESSES IN NEURAL NETWORKS
    HESKES, TM
    KAPPEN, B
    [J]. PHYSICAL REVIEW A, 1991, 44 (04): : 2718 - 2726
  • [10] PERFECT LOSS OF GENERALIZATION DUE TO NOISE IN K=2 PARITY MACHINES
    KABASHIMA, Y
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1994, 27 (06): : 1917 - 1927