Aggregated estimators and empirical complexity for least square regression

被引:26
作者
Audibert, JY
机构
[1] Univ Paris 06, Lab Probabil & Modeles Aleatoires, F-75013 Paris, France
[2] CREST, Lab Finance & Assurance, F-92245 Malakoff, France
来源
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES | 2004年 / 40卷 / 06期
关键词
nonparametric regression; deviation inequalities; adaptive estimator; oracle inequalities; boosting;
D O I
10.1016/j.anihpb.2003.11.006
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Numerous empirical results have shown that combining regression procedures can be a very efficient method. This work provides PAC bounds for the L-2 generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the Kullback-Leibler divergence between the aggregating distribution rho and a prior distribution pi and by the empirical mean of the variance of the regression functions under the probability rho. Secondly, by structural risk minimization, we derive an aggregating procedure which takes advantage of the unknown properties of the best mixture f: when the best convex combination f of d regression functions belongs to the d initial functions (i.e. when combining does not make the bias decrease), the convergence rate is of order (logd)/N. In the worst case, our combining procedure achieves a convergence rate of order root(logd)/N which is known to be optimal in a uniform sense when d > rootN (see [A. Nemirovski, in: Probability Summer School, Saint Flour, 1998; Y. Yang, Aggregating regression procedures for a better performance, 2001]). As in AdaBoost, our aggregating distribution tends to favor functions which disagree with the mixture on mispredicted points. Our algorithm is tested on artificial classification data (which have been also used for testing other boosting methods, such as AdaBoost). (C) 2004 Elsevier SAS. All rights reserved.
引用
收藏
页码:685 / 736
页数:52
相关论文
共 13 条
[1]  
Breiman L, 1998, ANN STAT, V26, P801
[2]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[3]  
CATONI O, 2001, PROBABILITY SUMMER S
[4]  
Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
[5]  
Friedman J., 1998, ADDITIVE LOGISTIC RE
[6]  
Juditsky A, 2000, ANN STAT, V28, P681
[7]  
Mammen E, 1999, ANN STAT, V27, P1808
[8]  
MCALLESTER DA, 2001, UNPUB MACHINE LEARNI
[9]  
NEMIROVSKI A, 1998, LECT PROBABILITY T 2
[10]  
Ratsch G., 2000, P 13 ANN C COMP LEAR, P170