SELECTING CONCISE TRAINING SETS FROM CLEAN DATA

被引:81
作者
PLUTOWSKI, M
WHITE, H
机构
[1] UNIV CALIF SAN DIEGO,INST NEURAL COMPUTAT,SAN DIEGO,CA 92103
[2] UNIV CALIF SAN DIEGO,DEPT ECON,SAN DIEGO,CA 92103
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1993年 / 4卷 / 02期
关键词
D O I
10.1109/72.207618
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We derive a method for selecting exemplars for training a multilayer feedfoward network architecture to estimate an unknown (deterministic) mapping from clean data, i.e., data measured either without error or with negligible error. Active data selection chooses from a given set of available examples a concise subset of training ''exemplars.'' In practice, this amounts to incrementally growing the training set as necessary to achieve the desired level of accuracy. Our selection criterion does not depend on using a neural network estimator, thus it may be used for general purpose nonlinear regression using any statistical estimator. The objective is to minimize the data requirement of learning. In a particular sense, we are performing a kind of data compression, by selecting exemplars representative of the set of all available examples. Towards this end, we choose a criterion for selecting training examples that works well in conjunction with the criterion used for learning, here, least squares. We proceed sequentially, selecting an example that, when added to the previous set of training examples and learned, maximizes the decrement of network squared error over the input space. When dealing with dean data and deterministic relationships, we desire concise training sets that minimize the Integrated Squared Bias (ISB). We use the ISB to derive a selection criterion for evaluating individual training examples, the DELTAISB, that we maximize to select new exemplars. We conclude with graphical illustrations of the method, and demonstrate its use during network training. Several benefits are apparent for practical use in a variety of applications. Experimental results indicate that training upon exemplars selected in this fashion can save computation in general purpose use as well.
引用
收藏
页码:305 / 318
页数:14
相关论文
共 42 条