What Size Net Gives Valid Generalization?

被引：910

作者：

Baum, Eric B. ^{[1
]}

Haussler, David ^{[2
]}

机构：

[1] CALTECH, Jet Prop Lab, Pasadena, CA 91109 USA

[2] Univ Calif Santa Cruz, Dept Comp & Informat Scie, Santa Cruz, CA 95064 USA

来源：

NEURAL COMPUTATION | 1989年 / 1卷 / 01期

基金：

美国国家航空航天局;

关键词：

D O I：

10.1162/neco.1989.1.1.151

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs, network size. Assume 0 < is an element of <= 1/8. We show that if m >= O(W/is an element of log N/is an element of) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 - is an element of/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 - is an element of of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Omega(W/is an element of) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 - is an element of fraction of the future test examples.

引用

页码：151 / 160

页数：10

共 23 条

[1] Baum E. B., 1988, J COMPLEXIT IN PRESS
[2] OCCAM RAZOR
BLUMER, A
EHRENFEUCHT, A
HAUSSLER, D
WARMUTH, MK
[J]. INFORMATION PROCESSING LETTERS, 1987, 24 (06) : 377 - 380
[3] Blumer A, 1987, J ACM IN PRESS
[4] GEOMETRICAL AND STATISTICAL PROPERTIES OF SYSTEMS OF LINEAR INEQUALITIES WITH APPLICATIONS IN PATTERN RECOGNITION
COVER, TM
[J]. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1965, EC14 (03): : 326 - &
[5] Denker J., 1987, Complex Systems, V1, P877
[6] AUTOMATIC PATTERN-RECOGNITION - A STUDY OF THE PROBABILITY OF ERROR
DEVROYE, L
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (04) : 530 - 543
[7] Duda R. O, 1973, PATTERN CLASSIFICATI
[8] Ehrenfeucht A., 1987, INFORM COMP IN PRESS
[9] QUANTIFYING INDUCTIVE BIAS - AI LEARNING ALGORITHMS AND VALIANTS LEARNING FRAMEWORK
HAUSSLER, D
[J]. ARTIFICIAL INTELLIGENCE, 1988, 36 (02) : 177 - 221
[10] Hinton G., 1987, ARTIFICIAL IN PRESS

← 1 2 3 →