Soft margins for AdaBoost

被引:1496
作者
Rätsch, G
Onoda, T
Müller, KR
机构
[1] GMD FIRST, D-12489 Berlin, Germany
[2] CRIEPI, Komae, Tokyo, Japan
[3] Univ Potsdam, D-14469 Potsdam, Germany
关键词
ADABOOST; arcing; large margin; soft margin; classification; support vectors;
D O I
10.1023/A:1007618119488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Recently ensemble methods like ADABOOST have been applied successfully in many problems, while seemingly defying the problems of overfitting. ADABOOST rarely overfits in the low noise regime, however, we show that it clearly does so for higher noise levels. Central to the understanding of this fact is the margin distribution. ADABOOST can be viewed as a constraint gradient descent in an error function with respect to the margin. We find that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. A hard margin is clearly a sub-optimal strategy in the noisy case, and regularization, in our case a "mistrust" in the data, must be introduced in the algorithm to alleviate the distortions that single difficult patterns (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original ADABOOST algorithm to achieve a soft margin. In particular we suggest (1) regularized ADABOOST(REG) where the gradient decent is done directly with respect to the soft margin and (2) regularized linear and quadratic programming (LP/QP-) ADABOOST, where the soft margin is attained by introducing slack variables. Extensive simulations demonstrate that the proposed regularized ADABOOST-type algorithms are useful and yield competitive results for noisy data.
引用
收藏
页码:287 / 320
页数:34
相关论文
共 48 条
[1]
[Anonymous], 2000, ADV LARGE MARGIN CLA
[2]
BENNETT K, 1998, ADV KERNEL METHODS S
[3]
BENNETT KP, 1992, OPTIMIZATION METHODS, V1, P23, DOI DOI 10.1080/10556789208805504
[4]
BERTONI A, 1997, LNCS, V5, P343
[5]
Bishop C. M., 1995, NEURAL NETWORKS PATT
[6]
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[7]
Breiman L, 1998, ANN STAT, V26, P801
[8]
Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[9]
BREIMAN L, 1997, 504 U CAL STAT DEP
[10]
BREIMAN L., 1999, rapport technique n 547