STATISTICAL-MECHANICS OF LEARNING FROM EXAMPLES

被引:344
作者
SEUNG, HS
SOMPOLINSKY, H
TISHBY, N
机构
[1] HEBREW UNIV JERUSALEM, CTR NEURAL COMPUTAT, IL-91904 JERUSALEM, ISRAEL
[2] AT&T BELL LABS, MURRAY HILL, NJ 07974 USA
来源
PHYSICAL REVIEW A | 1992年 / 45卷 / 08期
关键词
D O I
10.1103/PhysRevA.45.6056
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Learning from examples in feedforward neural networks is studied within a statistical-mechanical framework. Training is assumed to be stochastic, leading to a Gibbs distribution of networks characterized by a temperature parameter T. Learning of realizable rules as well as of unrealizable rules is considered. In the latter case, the target rule cannot be perfectly realized by a network of the given architecture. Two useful approximate theories of learning from examples are studied: the high-temperature limit and the annealed approximation. Exact treatment of the quenched disorder generated by the random sampling of the examples leads to the use of the replica theory. Of primary interest is the generalization curve, namely, the average generalization error epsilon(g) versus the number of examples P used for training. The theory implies that, for a reduction in epsilon(g) that remains finite in the large-N limit, P should generally scale as alpha-N, where N is the number of independently adjustable weights in the network. We show that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law. In contrast, for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule. In particular, a discontinuous learning transition from a state of poor to a state of perfect generalization can occur in nonsmooth networks learning realizable rules. We illustrate both gradual and continuous learning with a detailed analytical and numerical study of several single-layer perceptron models. Comparing with the exact replica theory of perceptron learning, we find that for realizable rules the high-temperature and annealed theories provide very good approximations to the generalization performance. Assuming this to hold for multilayer networks as well, we propose a classification of possible asymptotic forms of learning curves in general realizable models. For unrealizable rules we find that the above approximations fail in general to predict correctly the shapes of the generalization curves. Another indication of the important role of quenched disorder for unrealizable rules is that the generalization error is not necessarily a monotonically increasing function of temperature. Also, unrealizable rules can possess genuine spin-glass phases indicative of degenerate minima separated by high barriers.
引用
收藏
页码:6056 / 6091
页数:36
相关论文
共 65 条
  • [1] The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning
    Abu-Mostafa, Yaser S.
    [J]. NEURAL COMPUTATION, 1989, 1 (03) : 312 - 317
  • [2] STATISTICAL-MECHANICS OF A MULTILAYERED NEURAL NETWORK
    BARKAI, E
    HANSEL, D
    KANTER, I
    [J]. PHYSICAL REVIEW LETTERS, 1990, 65 (18) : 2312 - 2315
  • [3] STORAGE CAPACITY OF A MULTILAYER NEURAL NETWORK WITH BINARY WEIGHTS
    BARKAI, E
    KANTER, I
    [J]. EUROPHYSICS LETTERS, 1991, 14 (02): : 107 - 112
  • [4] BROKEN SYMMETRIES IN MULTILAYERED PERCEPTRONS
    BARKAI, E
    HANSEL, D
    SOMPOLINSKY, H
    [J]. PHYSICAL REVIEW A, 1992, 45 (06): : 4146 - 4161
  • [5] BARRON AR, 1989, 28TH P IEEE INT C DE, V1, P280
  • [6] What Size Net Gives Valid Generalization?
    Baum, Eric B.
    Haussler, David
    [J]. NEURAL COMPUTATION, 1989, 1 (01) : 151 - 160
  • [7] SPIN-GLASSES - EXPERIMENTAL FACTS, THEORETICAL CONCEPTS, AND OPEN QUESTIONS
    BINDER, K
    YOUNG, AP
    [J]. REVIEWS OF MODERN PHYSICS, 1986, 58 (04) : 801 - 976
  • [8] BINDER K, 1988, MONTE CARLO METHODS
  • [9] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    WARMUTH, MK
    [J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
  • [10] EXHAUSTIVE THERMODYNAMICAL ANALYSIS OF BOOLEAN LEARNING NETWORKS
    CARNEVALI, P
    PATARNELLO, S
    [J]. EUROPHYSICS LETTERS, 1987, 4 (10): : 1199 - 1204