Simultaneous selection of variables and smoothing parameters in structured additive regression models

被引:41
作者
Belitz, Christiane [2 ]
Lang, Stefan [1 ]
机构
[1] Univ Innsbruck, Dept Stat, A-6020 Innsbruck, Austria
[2] Univ Munich, Dept Stat, D-80539 Munich, Germany
关键词
D O I
10.1016/j.csda.2008.05.032
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, considerable research has been devoted to developing complex regression models that can deal simultaneously with nonlinear covariate effects and time trends, unitor cluster specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different types. Much less effort, however, has been devoted to model and variable selection. The paper develops a methodology for the simultaneous selection of variables and the degree of smoothness in regression models with a structured additive predictor. These models are quite general, containing additive (mixed) models, geoadditive models and varying coefficient models as special cases. This approach allows one to decide whether a particular covariate enters the model linearly or nonlinearly or is removed from the model. Moreover, it is possible to decide whether a spatial or cluster specific effect should be incorporated into the model to cope with spatial or cluster specific heterogeneity. Particular emphasis is also placed on selecting complex interactions between covariates and effects of different types. A new penalty for two-dimensional smoothing is proposed, that allows for ANOVA-type decompositions into main effects and an interaction effect without explicitly specifying the main effects. The penalty is an additive combination of other penalties. Fast algorithms and software are developed that allow one to even handle situations with many covariate effects and observations. The algorithms are related to backfitting and Markov chain Monte Carlo techniques, which divide the problem in a divide and conquer strategy into smaller pieces. Confidence intervals taking model uncertainty into account are based on the bootstrap in combination with MCMC techniques. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:61 / 81
页数:21
相关论文
共 55 条
[1]  
[Anonymous], 2006, R MANUAL MGCV PACKAG
[2]  
BELITZ C, 2007, THESIS
[3]   Simple and multiple P-splines regression with shape constraints [J].
Bollaerts, Kaatje ;
Eilers, Paul H. C. ;
van Mechelen, Iven .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2006, 59 :451-469
[4]  
Brent R., 2003, ALGORITHMS MINIMIZAT
[5]  
Brezger A, 2005, J STAT SOFTW, V14, P1
[6]   Generalized structured additive regression based on Bayesian P-splines [J].
Brezger, A ;
Lang, S .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (04) :967-991
[7]  
BREZGER A, 2005, BAYESX MANUALS
[8]   Boosting for high-dimensional linear models [J].
Buhlmann, Peter .
ANNALS OF STATISTICS, 2006, 34 (02) :559-583
[9]   Objective Bayesian variable selection [J].
Casella, G ;
Moreno, E .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :157-167
[10]  
CHAMBERS JM, 1991, STAT MODELS S CHAPMA