We derive a method for selecting exemplars for training a multilayer feedfoward network architecture to estimate an unknown (deterministic) mapping from clean data, i.e., data measured either without error or with negligible error. Active data selection chooses from a given set of available examples a concise subset of training ''exemplars.'' In practice, this amounts to incrementally growing the training set as necessary to achieve the desired level of accuracy. Our selection criterion does not depend on using a neural network estimator, thus it may be used for general purpose nonlinear regression using any statistical estimator. The objective is to minimize the data requirement of learning. In a particular sense, we are performing a kind of data compression, by selecting exemplars representative of the set of all available examples. Towards this end, we choose a criterion for selecting training examples that works well in conjunction with the criterion used for learning, here, least squares. We proceed sequentially, selecting an example that, when added to the previous set of training examples and learned, maximizes the decrement of network squared error over the input space. When dealing with dean data and deterministic relationships, we desire concise training sets that minimize the Integrated Squared Bias (ISB). We use the ISB to derive a selection criterion for evaluating individual training examples, the DELTAISB, that we maximize to select new exemplars. We conclude with graphical illustrations of the method, and demonstrate its use during network training. Several benefits are apparent for practical use in a variety of applications. Experimental results indicate that training upon exemplars selected in this fashion can save computation in general purpose use as well.