Model selection and error estimation

被引:202
作者
Bartlett, PL
Boucheron, S
Lugosi, G
机构
[1] BIOwulf Technol, Berkeley, CA 94704 USA
[2] Univ Paris 11, CNRS, Rech Informat Lab, F-91405 Orsay, France
[3] Pompeu Fabra Univ, Dept Econ, Barcelona 08005, Spain
基金
澳大利亚研究理事会;
关键词
model selection; penalization; concentration inequalities; empirical penalties;
D O I
10.1023/A:1013999503812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
引用
收藏
页码:85 / 113
页数:29
相关论文
共 52 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2000, P 2 ICSC S NEUR COMP
[3]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[4]   Risk bounds for model selection via penalization [J].
Barron, A ;
Birgé, L ;
Massart, P .
PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) :301-413
[5]   MINIMUM COMPLEXITY DENSITY-ESTIMATION [J].
BARRON, AR ;
COVER, TM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (04) :1034-1054
[6]  
BARRON AR, 1991, NATO ADV SCI I C-MAT, V335, P561
[7]  
BARRON AR, 1985, 56 TR STANF U DEP ST
[8]  
Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43
[9]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[10]   Minimum contrast estimators on sieves: exponential bounds and rates of convergence [J].
Birge, L ;
Massart, P .
BERNOULLI, 1998, 4 (03) :329-375