Tuning variable selection procedures by adding noise

被引:34
作者
Luo, Xiaohui [1 ]
Stefanski, Leonard A.
Boos, Dennis D.
机构
[1] Merck Res Labs, Clin Biostat, Rahway, NJ 07065 USA
[2] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
关键词
akaike information criterion; Bayes information criterion; forward selection; mallows C-p; model selection; regression; SIMEX;
D O I
10.1198/004017005000000319
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many variable selection methods for linear regression depend critically on tuning parameters that control the performance of the method. for example, "entry" and "stay" significance levels in forward and backward selection. However, most methods do not adapt the tuning parameters to particular datasets. We propose a general strategy for adapting variable selection tuning parameters that effectively estimates the tuning parameters so that the selection method avoids overfitting and underfitting. The strategy is based on the principle that overtitting and underfitting can be directly observed in estimates of the error variance after adding controlled amounts of additional independent noise to the response variable. then running a variable selection method. It is related to the simulation technique SIMEX found in the measurement error literature. We focus on forward selection because of its simplicity and ability to handle large numbers of explanatory variables. Monte Carlo studies show that the new method compares favorably with established methods.
引用
收藏
页码:165 / 175
页数:11
相关论文
共 18 条