GENERALIZATION IN A LINEAR PERCEPTRON IN THE PRESENCE OF NOISE

被引:80
作者
KROGH, A
HERTZ, JA
机构
[1] NIELS BOHR INST,DK-2100 COPENHAGEN,DENMARK
[2] NORDITA,DK-2100 COPENHAGEN,DENMARK
来源
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL | 1992年 / 25卷 / 05期
关键词
D O I
10.1088/0305-4470/25/5/020
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We study the evolution of the generalization ability of a simple linear perceptron with N inputs which learns to imitate a 'teacher perceptron'. The system is trained on p = alpha-N example inputs drawn from some distribution and the generalization ability is measured by the average agreement with the teacher on test examples drawn from the same distribution. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at alpha = 1, when there are no errors (static noise) in the training examples. If the examples are produced by an erroneous teacher, overfitting is observed, i.e. the generalization error starts to increase after a finite time of training. It is shown that a weight decay of the same size as the variance of the noise (errors) on the teacher improves on the generalization and suppresses the overfitting. The generalization error as a function of time is calculated numerically for various values of the parameters. Finally dynamic noise in the training is considered. White noise on the input corresponds on average to a weight decay, and can thus improve generalization, whereas white noise on the weights or the output degrades generalization. Generalization is particularly sensitive to noise on the weights (for alpha < 1) where it makes the error constantly increase with time, but this effect is also shown to be damped by a weight decay. Weight noise and output noise acts similarly above the transition at alpha = 1.
引用
收藏
页码:1135 / 1147
页数:13
相关论文
共 23 条
  • [1] The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning
    Abu-Mostafa, Yaser S.
    [J]. NEURAL COMPUTATION, 1989, 1 (03) : 312 - 317
  • [2] [Anonymous], 1991, INTRO THEORY NEURAL, DOI DOI 10.1201/9780429499661
  • [3] What Size Net Gives Valid Generalization?
    Baum, Eric B.
    Haussler, David
    [J]. NEURAL COMPUTATION, 1989, 1 (01) : 151 - 160
  • [4] THE LANGEVIN METHOD IN THE STATISTICAL DYNAMICS OF LEARNING
    DER, R
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1990, 23 (15): : L763 - L766
  • [5] 3 UNFINISHED WORKS ON THE OPTIMAL STORAGE CAPACITY OF NETWORKS
    GARDNER, E
    DERRIDA, B
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1989, 22 (12): : 1983 - 1994
  • [6] TRAINING WITH NOISE AND THE STORAGE OF CORRELATED PATTERNS IN A NEURAL NETWORK MODEL
    GARDNER, EJ
    STROUD, N
    WALLACE, DJ
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1989, 22 (12): : 2019 - 2030
  • [7] INFERENCE OF A RULE BY A NEURAL NETWORK WITH THERMAL NOISE
    GYORGYI, G
    [J]. PHYSICAL REVIEW LETTERS, 1990, 64 (24) : 2957 - 2960
  • [8] GYORGYI G, 1990, NEURAL NETWORKS SPIN
  • [9] LEARNING FROM EXAMPLES IN A SINGLE-LAYER NEURAL NETWORK
    HANSEL, D
    SOMPOLINSKY, H
    [J]. EUROPHYSICS LETTERS, 1990, 11 (07): : 687 - 692
  • [10] PHASE-TRANSITIONS IN SIMPLE LEARNING
    HERTZ, JA
    KROGH, A
    THORBERGSSON, GI
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1989, 22 (12): : 2133 - 2150