BAYESIAN REGULARIZATION AND PRUNING USING A LAPLACE PRIOR

被引：256

作者：

WILLIAMS, PM

机构：

来源：

NEURAL COMPUTATION | 1995年 / 7卷 / 01期

关键词：

D O I：

10.1162/neco.1995.7.1.117

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning-in the sense of setting weights to exact zeros-becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

引用

页码：117 / 143

页数：27

共 23 条

[1] CURVATURE-DRIVEN SMOOTHING - A LEARNING ALGORITHM FOR FEEDFORWARD NETWORKS [J].