Boosting with the L2 loss:: Regression and classification

被引:584
作者
Bühlmann, P
Yu, B
机构
[1] Swiss Fed Inst Technol, Seminar Stat, CH-8032 Zurich, Switzerland
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
关键词
functional gradient descent; LogitBoost; minimax error rate; nonparametric classification; nonparametric regression; smoothing spline;
D O I
10.1198/016214503000125
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article investigates a computationally simple variant of boosting, L(2)Boost, which is constructed from a functional gradient descent algorithm with the L-2-loss function. Like other boosting algorithms, L(2)Boost uses many times in an iterative fashion a prechosen fitting method, called the learner. Based on the explicit expression of refitting of residuals of L(2)Boost, the case with (symmetric) linear learners is studied in detail in both regression and classification. In particular, with the boosting iteration m working as the smoothing or regularization parameter, a new exponential bias-variance trade-off is found with the variance (complexity) term increasing very slowly as m tends to infinity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classification and the boosted smoothing spline even adapts to higher-order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0-1 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classification. Finally, simulation and real dataset results are obtained to demonstrate the attractiveness of L(2)Boost. In particular, we demonstrate that L(2)Boosting with a novel component-wise cubic smoothing spline is both practical and effective in the presence of high-dimensional predictors.
引用
收藏
页码:324 / 339
页数:16
相关论文
共 27 条
[1]   Reducing multiclass to binary: A unifying approach for margin classifiers [J].
Allwein, EL ;
Schapire, RE ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2001, 1 (02) :113-141
[2]   Prediction games and arcing algorithms [J].
Breiman, L .
NEURAL COMPUTATION, 1999, 11 (07) :1493-1517
[3]  
Breiman L, 1998, ANN STAT, V26, P801
[4]  
BREIMAN L, 2000, 579 U CAL DEP STAT
[5]  
Buja A, 2000, ANN STAT, V28, P387
[6]  
COLLINS M, 2000, P 13 ANN C COMP LEAR
[7]  
Devroye L., 1996, A probabilistic theory of pattern recognition
[8]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[9]   BOOSTING A WEAK LEARNING ALGORITHM BY MAJORITY [J].
FREUND, Y .
INFORMATION AND COMPUTATION, 1995, 121 (02) :256-285
[10]  
Freund Y, 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P45