VALID POST-SELECTION INFERENCE

被引:369
作者
Berk, Richard [1 ]
Brown, Lawrence [1 ]
Buja, Andreas [1 ]
Zhang, Kai [1 ]
Zhao, Linda [1 ]
机构
[1] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Linear regression; model selection; multiple comparison; family-wise error; high-dimensional inference; sphere packing; MAXIMUM-LIKELIHOOD ESTIMATORS; MODEL-SELECTION; CONFIDENCE-INTERVALS; GAUSSIAN REGRESSION; CONDITIONAL LEVEL; PROPERTY; STUDENTS; LASSO;
D O I
10.1214/12-AOS1077
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid "post-selection inference" by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing "simultaneity insurance" for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.
引用
收藏
页码:802 / 837
页数:36
相关论文
共 34 条
[1]  
Angrist JD, 2009, MOSTLY HARMLESS ECONOMETRICS: AN EMPIRICISTS COMPANION, P1
[2]  
[Anonymous], 1999, MATH GAZ, DOI DOI 10.2307/3619120
[3]  
[Anonymous], 2009, ELEMENTS STAT LEARNI, DOI DOI 10.1007/978-0-387-84858-7
[4]   A NOTE ON QUANTILES IN LARGE SAMPLES [J].
BAHADUR, RR .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (03) :577-&
[5]  
BERK R, 2013, VALID POSTSELECTIO S, P95165, DOI DOI 10.1214/12-A0S1077SUPP
[6]   CONDITIONAL LEVEL OF STUDENTS T TEST [J].
BROWN, L .
ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (04) :1068-&
[7]   NOTE ON A CONDITIONAL PROPERTY OF STUDENTS [J].
BUEHLER, RJ ;
FEDDERSEN, AP .
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (03) :1098-&
[8]   The focused information criterion [J].
Claeskens, G ;
Hjort, NL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (464) :900-916
[9]  
Dijkstra T.K., 1988, MODEL UNCERTAINTY IT, P17
[10]  
HALL P, 1989, J ROY STAT SOC B MET, V51, P3