Aggregation for gaussian regression

被引:177
作者
Bunea, Florentina [1 ]
Tsybakov, Alexandre B.
Wegkamp, Marten H.
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
[2] Univ Paris 06, Lab Probabil & Models Aleatoires, F-75252 Paris 05, France
关键词
aggregation; lasso estimator; minimax risk; model selection; model averaging; nonparametric regression; oracle inequalities; penalized least squares;
D O I
10.1214/009053606000001587
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper studies statistical aggregation procedures in the regression setting. A motivating factor is the existence of many different methods of estimation, leading to possibly competing estimators. We consider here three different types of aggregation: model selection (MS) aggregation, convex (C) aggregation and linear (L) aggregation. The objective of (MS) is to select the optimal single estimator from the list; that of (C) is to select the optimal convex combination of the given estimators; and that of (L) is to select the optimal linear combination of the given estimators. we are interested in evaluating the rates of convergence of the excess risks of the estimators obtained by these procedures. Our approach is motivated by recently published minimax results [Nemirovski, A. (2000). Topics in non-parametric statistics. Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85-277. Springer, Berlin; Tsybakov, A. B. (2003). Optimal rates of aggregation. Learning Theory and Kernel Machines. Lecture Notes in Artificial Intelligence 2777 303-313. Springer, Heidelberg]. There exist competing aggregation procedures achieving optimal convergence rates for each of the (MS), (C) and (L) cases separately. Since these procedures are not directly comparable with each other, we suggest an alternative solution. We prove that all three optimal rates, as well as those for the newly introduced (S) aggregation (subset selection), are nearly achieved via a single "universal" aggregation procedure. The procedure consists of mixing the initial estimators with weights obtained by penalized least squares. Two different penalties are considered: one of them is of the BIC type, the second one is a data-dependent l(1)-type penalty.
引用
收藏
页码:1674 / 1697
页数:24
相关论文
共 44 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2001, Journal of the European Mathematical Society, DOI DOI 10.1007/S100970100031
[3]   Regularization of wavelet approximations - Rejoinder [J].
Antoniadis, A ;
Fan, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :964-967
[4]   Aggregated estimators and empirical complexity for least square regression [J].
Audibert, JY .
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 2004, 40 (06) :685-736
[5]   Model selection for regression on a fixed design [J].
Baraud, Y .
PROBABILITY THEORY AND RELATED FIELDS, 2000, 117 (04) :467-493
[6]  
Baraud Y., 2002, ESAIM Probability and Statistics, V6, P127
[7]   Risk bounds for model selection via penalization [J].
Barron, A ;
Birgé, L ;
Massart, P .
PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) :301-413
[8]   UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION [J].
BARRON, AR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) :930-945
[9]  
BARTLETT PL, 2000, P 13 ANN C COMP LEAR, P286
[10]   Model selection via testing:: an alternative to (penalized) maximum likelihood estimators [J].
Birgé, L .
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 2006, 42 (03) :273-325