The lack of A priori distinctions between learning algorithms

被引：902

作者：

Wolpert, DH

机构：

[1] Santa Fe Institute, Santa Fe, NM 87501

来源：

NEURAL COMPUTATION | 1996年 / 8卷 / 07期

关键词：

D O I：

10.1162/neco.1996.8.7.1341

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are ''as many'' targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is ''anti-cross-validation'' (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small. Other implications for ''membership queries'' algorithms and ''punting'' algorithms are also discussed.

引用

页码：1341 / 1390

页数：50

共 33 条

[1] Anthony Martin., 1992, COMPUTATIONAL LEARNI
[2] Berger J.O., 1985, STAT DECISION THEORY
[3] Bernardo J. M., 1994, BAYESIAN THEORY
[4] OCCAM RAZOR
BLUMER, A
EHRENFEUCHT, A
HAUSSLER, D
WARMUTH, MK
[J]. INFORMATION PROCESSING LETTERS, 1987, 24 (06) : 377 - 380
[5] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
BLUMER, A
EHRENFEUCHT, A
HAUSSLER, D
WARMUTH, MK
[J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
[6] Bridle J. S., 1990, Neurocomputing, Algorithms, Architectures and Applications. Proceedings of the NATO Advanced Research Workshop, P227
[7] DIETTERICH TG, 1989, ANNU REV COMPUT SCI, V4, P255
[8] DRUCKER H, 1993, NEURAL INFORMATION P
[9] DUDA R. O., 2000, Pattern Classification and Scene Analysis
[10] ON MEAN ACCURACY OF STATISTICAL PATTERN RECOGNIZERS
HUGHES, GF
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (01) : 55 - +

← 1 2 3 4 →