Regularization learning, early stopping and biased estimator

被引:21
作者
Hagiwara, K [1 ]
机构
[1] Mie Univ, Fac Phys Engn, Tsu, Mie 5148507, Japan
关键词
regularization learning; early stopping; biased estimator; shrinkage estimator; bias/variance dilemma;
D O I
10.1016/S0925-2312(01)00681-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In this article, we present a unified statistical interpretation of regularization learning and early stopping for linear networks in the context of statistical regression; i.e. linear regression model. Firstly, those concepts are shown to be equivalent with the use of a biased estimator under the purpose of constructing the network with lower generalization error than the least-squares estimator. It is also found that the biased estimator is a shrinkage estimator. Secondly, we showed that the optimal regularization parameter or the optimal stopping time according to the generalization error are obtained by solving the bias/variance dilemma. Lastly, we gave the estimates of the optimal regularization parameter and the optimal stopping time based on the training data. Simple numerical simulations showed that those estimates are possible to improve the generalization error compared with the least-squares estimator. Additionally, we discussed the relationship between the Bayesian interpretation of the regularization parameter and the optimal regularization parameter which minimizes the generalization error. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:937 / 955
页数:19
相关论文
共 20 条
[2]
Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[3]
Asymptotic statistical theory of overtraining and cross-validation [J].
Amari, S ;
Murata, N ;
Muller, KR ;
Finke, M ;
Yang, HH .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (05) :985-996
[4]
Barron A., 1984, SELF ORG METHODS MOD, P87
[5]
Bishop C. M., 1995, NEURAL NETWORKS PATT
[6]
Breiman L., 1998, Neural Networks and Machine Learning. Proceedings, P27
[7]
BUNTINE WL, 1991, COMPLEX SYST, V5, P877
[8]
No free lunch for early stopping [J].
Cataltepe, Z ;
Abu-Mostafa, YS ;
Magdon-Ismail, M .
NEURAL COMPUTATION, 1999, 11 (04) :995-1009
[9]
COPAS JB, 1983, J R STAT SOC B, V45, P311
[10]
NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58