IMPROVING MODEL SELECTION BY NONCONVERGENT METHODS

被引:142
作者
FINNOFF, W
HERGERT, F
ZIMMERMANN, HG
机构
关键词
MODEL SELECTION; GENERALIZATION; WEIGHT PRUNING; PENALTY TERMS; CROSS-VALIDATION; NONCONVERGENT TRAINING; DYNAMIC TOPOLOGY MODIFICATIONS;
D O I
10.1016/S0893-6080(05)80122-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many techniques for model selection in the field of neural networks correspond to well established statistical methods. For example, architecture modifications based on test variables calculated after convergence of the training process can be viewed as part of a hypothesis testing search, and the use of complexity penalty terms is essentially a type of regularization or biased regression. The method of ''stopped'' or ''cross-validation '' training, on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation since model selection doesn't require convergence of the training process. Here, the training process is used to perform a directed search of the parameter space for a model which doesn't overfit the data and thus demonstrates superior generalization performance. In this paper we show that this performance can be significantly enhanced by expanding the ''nonconvergent method'' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. On an extensive sequence of simulation examples we demonstrate the general superiority of the ''extended'' nonconvergent methods compared to classical penalty term methods, simple stopped training, and methods which only vary the number of hidden units.
引用
收藏
页码:771 / 783
页数:13
相关论文
共 25 条
  • [1] ABE S, 1990, P NEURO NIMES 90 NIM
  • [2] ABE S, 1990, P MVA 90 IAPR WORKSH
  • [3] [Anonymous], 1990, ADV NEURAL INF PROCE
  • [4] Temporal Evolution of Generalization during Learning in Linear Networks
    Baldi, Pierre
    Chauvin, Yves
    [J]. NEURAL COMPUTATION, 1991, 3 (04) : 589 - 603
  • [5] DARKEN C, 1991, ADV NEURAL INFORMATI, P832
  • [6] Eubank R.L., 1988, SPLINE SMOOTHING NON
  • [7] FINNOFF W, 1991, IEEE IJCNN, P2624, DOI 10.1109/IJCNN.1991.170349
  • [8] FINNOFF W, 1991, 2ND P ANN WORKSH COM
  • [9] FINNOFF W, 1992, IN PRESS ADV NEURAL
  • [10] GUYON I, 1992, ADV NEUR IN, V4, P471