Boosting with the L2 loss:: Regression and classification

被引：584

作者：

Bühlmann, P

Yu, B

机构：

[1] Swiss Fed Inst Technol, Seminar Stat, CH-8032 Zurich, Switzerland

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2003年 / 98卷 / 462期

关键词：

functional gradient descent; LogitBoost; minimax error rate; nonparametric classification; nonparametric regression; smoothing spline;

D O I：

10.1198/016214503000125

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

This article investigates a computationally simple variant of boosting, L(2)Boost, which is constructed from a functional gradient descent algorithm with the L-2-loss function. Like other boosting algorithms, L(2)Boost uses many times in an iterative fashion a prechosen fitting method, called the learner. Based on the explicit expression of refitting of residuals of L(2)Boost, the case with (symmetric) linear learners is studied in detail in both regression and classification. In particular, with the boosting iteration m working as the smoothing or regularization parameter, a new exponential bias-variance trade-off is found with the variance (complexity) term increasing very slowly as m tends to infinity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classification and the boosted smoothing spline even adapts to higher-order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0-1 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classification. Finally, simulation and real dataset results are obtained to demonstrate the attractiveness of L(2)Boost. In particular, we demonstrate that L(2)Boosting with a novel component-wise cubic smoothing spline is both practical and effective in the presence of high-dimensional predictors.

引用

页码：324 / 339

页数：16

共 27 条

[1] Reducing multiclass to binary: A unifying approach for margin classifiers [J].