BROKEN SYMMETRIES IN MULTILAYERED PERCEPTRONS

被引:90
作者
BARKAI, E
HANSEL, D
SOMPOLINSKY, H
机构
[1] HEBREW UNIV JERUSALEM, RACAH INST PHYS, IL-91904 JERUSALEM, ISRAEL
[2] ECOLE POLYTECH, CTR PHYS THEOR, F-91128 PALAISEAU, FRANCE
来源
PHYSICAL REVIEW A | 1992年 / 45卷 / 06期
关键词
D O I
10.1103/PhysRevA.45.4146
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
The statistical mechanics of two-layered perceptrons with N input units, K hidden units, and a single output unit that makes a decision based on a majority rule (Committee Machine), is studied. Two architectures are considered. In the nonoverlapping case the hidden units do not share common inputs. In the fully connected case each hidden unit is connected to the entire input layer. In both cases the network realizes a random dichotomy of P inputs. The statistical properties of the space of solutions as a function of P is studied, using the replica method, and by numerical simulations, in the regime where N >> K. In the nonoverlapping architecture with continuously varying weights the capacity, defined as the maximal number of P per weight, (alpha(c)), is calculated under a replica-symmetric (RS) ansatz. At large K, alpha(c) diverges as K1/2 in contradiction with the rigorous upper bound, alpha(c) < ClnK, where C is a proportionality constant, derived by Mitchison and Durbin [Biol. Cybern. 60, 345 (1989)]. This suggests a strong replica-symmetry-breaking effect. The instability of the RS solution is shown to occur at a value of alpha which remains finite in the large-K limit. A one-step replica-symmetry-breaking (RSB) ansatz is studied for K = 3 and in the limit K goes to infinity. The results indicate that alpha(c)(K) diverges with K, probably logarithmically. The occurrence of RSB far below the capacity limit is confirmed by comparison of the theoretical results with numerical simulations for K = 3. This symmetry breaking implies that unlike the single-layer perceptron case, the space of solutions of the two-layer perceptron breaks, beyond a critical value of alpha, into many disjoint subregions. The entropies of the connected subregions are almost degenerate, their relative difference being of order 1/N. In the case of a nonoverlapping Committee Machine with binary, i.e., +/- 1 weights, alpha(c) less-than-or-equal-to 1 is an upper bound for all K. The RS theory predicts alpha(c) = 0.92 for K = 3 and alpha(c) = 0.95 for the large-K limit. The theoretical prediction (for K = 3) is in excellent agreement with the numerical estimate based on an exhaustive search in the space of solutions for small N. These results indicate that in the binary case there is no RSB in the space of solutions below the maximal capacity. In the fully connected architecture, the solution's phase space has a global permutation symmetry (PS) reflecting the invariance under permuting the hidden units. The order parameters that signal the spontaneous breaking of this symmetry are defined. The replica-symmetry theory shows that for small alpha the PS is maintained. For larger values of alpha < alpha(c) the symmetry is broken, implying the breaking of the solution space into disjoint regions. These regions are related by permutation symmetry, hence they are fully degenerate with respect to their entropies and statistical properties. This prediction has been tested by simulations of the K = 3 case, calculating the order parameters by random walks in the space of solutions. They yield good evidence for existence of a phase with broken permutation symmetry at values of alpha greater-than-or-equal-to 2. Finally, both theory and simulations show that for a typical fully connected network the connections joining the same input to a pair of hidden units are negatively correlated.
引用
收藏
页码:4146 / 4161
页数:16
相关论文
共 16 条
[1]   STATISTICAL-MECHANICS OF A MULTILAYERED NEURAL NETWORK [J].
BARKAI, E ;
HANSEL, D ;
KANTER, I .
PHYSICAL REVIEW LETTERS, 1990, 65 (18) :2312-2315
[2]  
BARKAI E, UNPUB
[3]  
Baum E. B., 1988, Journal of Complexity, V4, P193, DOI 10.1016/0885-064X(88)90020-9
[4]   GEOMETRICAL AND STATISTICAL PROPERTIES OF SYSTEMS OF LINEAR INEQUALITIES WITH APPLICATIONS IN PATTERN RECOGNITION [J].
COVER, TM .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1965, EC14 (03) :326-&
[5]   RANDOM-ENERGY MODEL - AN EXACTLY SOLVABLE MODEL OF DISORDERED-SYSTEMS [J].
DERRIDA, B .
PHYSICAL REVIEW B, 1981, 24 (05) :2613-2626
[6]   OPTIMAL STORAGE PROPERTIES OF NEURAL NETWORK MODELS [J].
GARDNER, E ;
DERRIDA, B .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1988, 21 (01) :271-284
[7]   THE SPACE OF INTERACTIONS IN NEURAL NETWORK MODELS [J].
GARDNER, E .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1988, 21 (01) :257-270
[8]  
Gardner Elizabeth, 1989, J PHYS A, V22
[9]   THE SIMPLEST SPIN-GLASS [J].
GROSS, DJ ;
MEZARD, M .
NUCLEAR PHYSICS B, 1984, 240 (04) :431-452
[10]   CAPACITY OF NEURAL NETWORKS WITH DISCRETE SYNAPTIC COUPLINGS [J].
GUTFREUND, H ;
STEIN, Y .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1990, 23 (12) :2613-2630