BAYESIAN REGULARIZATION AND PRUNING USING A LAPLACE PRIOR

被引:256
作者
WILLIAMS, PM
机构
关键词
D O I
10.1162/neco.1995.7.1.117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning-in the sense of setting weights to exact zeros-becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
引用
收藏
页码:117 / 143
页数:27
相关论文
共 23 条
[1]   CURVATURE-DRIVEN SMOOTHING - A LEARNING ALGORITHM FOR FEEDFORWARD NETWORKS [J].
BISHOP, CM .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (05) :882-884
[2]  
Denker J., 1987, Complex Systems, V1, P877
[3]  
Gill P. E., 1981, PRACTICAL OPTIMIZATI
[4]  
HASSIBI B, 1993, ADV NEURAL INFORMATI, P164
[5]   PRIOR PROBABILITIES [J].
JAYNES, ET .
IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, 1968, SSC4 (03) :227-&
[6]  
LeCun Y, 1990, ADV NEURAL INFORM PR, P598
[7]   A PRACTICAL BAYESIAN FRAMEWORK FOR BACKPROPAGATION NETWORKS [J].
MACKAY, DJC .
NEURAL COMPUTATION, 1992, 4 (03) :448-472
[8]  
MACKAY DJC, 1994, MAXIMUM ENTROPY BAYE
[9]   A SCALED CONJUGATE-GRADIENT ALGORITHM FOR FAST SUPERVISED LEARNING [J].
MOLLER, MF .
NEURAL NETWORKS, 1993, 6 (04) :525-533
[10]  
MOLLER MF, 1993, DAIMI PB432 AARH U C