Boosting with early stopping: Convergence and consistency

被引：246

作者：

Zhang, T ^{[1
]}

Yu, B

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

ANNALS OF STATISTICS | 2005年 / 33卷 / 04期

关键词：

boosting; greedy optimization; matching pursuit; early stopping; consistency;

D O I：

10.1214/009053605000000255

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to rigorous proof that for a linearly separable problem, AdaBoost with epsilon -> 0 step-size becomes an L-1-margin maximizer when left to run to convergence.

引用

页码：1538 / 1579

页数：42

共 39 条

[1] UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION
BARRON, AR
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) : 930 - 945
[2] Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[3] Local Rademacher complexities
Bartlett, PL
Bousquet, O
Mendelson, S
[J]. ANNALS OF STATISTICS, 2005, 33 (04) : 1497 - 1537
[4] BARTLETT PL, 2005, IN PRESS J AM STAT A
[5] On the rate of convergence of regularized boosting classifiers
Blanchard, G
Lugosi, G
Vayatis, N
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (05) : 861 - 894
[6] BOUSQUET O., 2002, Lecture Notes in Artificial Intelligence, V2375, P59, DOI 10.1007/3-540-45435-7_5
[7] Breiman L, 2004, ANN STAT, V32, P1
[8] Prediction games and arcing algorithms
Breiman, L
[J]. NEURAL COMPUTATION, 1999, 11 (07) : 1493 - 1517
[9] Breiman L, 1998, ANN STAT, V26, P801
[10] Boosting with the L2 loss:: Regression and classification
Bühlmann, P
Yu, B
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) : 324 - 339

← 1 2 3 4 →