Metric-based methods for adaptive model selection and regularization

被引:24
作者
Schuurmans, D [1 ]
Southey, F [1 ]
机构
[1] Univ Waterloo, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
model selection; regularization; unlabeled examples;
D O I
10.1023/A:1013947519741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against over-fitting (while still avoiding under-fitting). We then extend the approach to derive a general training criterion for supervised learning-yielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specific set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that sufficient unlabeled training data be available.
引用
收藏
页码:51 / 84
页数:34
相关论文
共 40 条
[1]
NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]
STATISTICAL PREDICTOR IDENTIFICATION [J].
AKAIKE, H .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1970, 22 (02) :203-&
[3]
Anthony M., 1999, Neural network learning: theoretical foundations, Vfirst
[4]
Bishop C. M., 1995, NEURAL NETWORKS PATT
[5]
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter [J].
Castelli, V ;
Cover, TM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1996, 42 (06) :2102-2117
[8]
CHERKASSKY V, 1997, P WORLD C NEUR NETW, P957
[9]
Cherkassky V.S., 1998, LEARNING DATA CONCEP, V1st ed.
[10]
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X