Comparison of model selection for regression

被引:116
作者
Cherkassky, V [1 ]
Ma, YQ [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
关键词
D O I
10.1162/089976603321891864
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods-Akaike information criterion (AIC) and Bayesian information criterion (BIC)-and the structural risk minimization (SRM) method, based on Vapnik-Chervonenkis (VC) theory, for regression problems. Our study is motivated by empirical comparisons in Hastie, Tibshirani, and Friedman (2001), which claims that the SRM method performs poorly for model selection and suggests that AIC yields superior predictive performance. Hence, we present empirical comparisons for various data sets and different types of estimators (linear, subset selection, and k-nearest neighbor regression). Our results demonstrate the practical advantages of VC-based model selection, it consistently outperforms AIC for all data sets. In our study, SRM and BIC methods show similar predictive performance. This discrepancy (between empirical results obtained using the same data) is caused by methodological drawbacks in Hastie et al. (2001), especially in their loose interpretation and application of SRM method. Hence, we discuss methodological issues important for meaningful comparisons and practical application of SRM method. We also point out the importance of accurate estimation of model complexity (VC-dimension) for empirical comparisons and propose a new practical estimate of model complexity for k-nearest neighbors regression.
引用
收藏
页码:1691 / 1714
页数:24
相关论文
共 15 条
[1]   STATISTICAL PREDICTOR IDENTIFICATION [J].
AKAIKE, H .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1970, 22 (02) :203-&
[2]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[3]   Model selection for small sample regression [J].
Chapelle, O ;
Vapnik, V ;
Bengio, Y .
MACHINE LEARNING, 2002, 48 (1-3) :9-23
[4]   Model complexity control for regression using VC generalization bounds [J].
Cherkassky, V ;
Shao, XH ;
Mulier, FM ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1075-1089
[5]   Signal estimation and denoising using VC-theory [J].
Cherkassky, V ;
Shao, X .
NEURAL NETWORKS, 2001, 14 (01) :37-52
[6]   Myopotential denoising of ECG signals using wavelet thresholding methods [J].
Cherkassky, V ;
Kilts, S .
NEURAL NETWORKS, 2001, 14 (08) :1129-1137
[7]  
Cherkassky V.S., 1998, LEARNING DATA CONCEP, V1st ed.
[8]  
HARDLE W, 1995, APPL NONPARAMETRIC R
[9]  
Hart, 2006, PATTERN CLASSIFICATI
[10]  
Hastie T, 2008, The elements of statistical learning, Vsecond, DOI DOI 10.1007/978-0-387-21606-5