Linear programming boosting via column generation

被引:266
作者
Demiriz, A [1 ]
Bennett, KP
Shawe-Taylor, J
机构
[1] Rensselaer Polytech Inst, Dept Decis Sci & Engn Syst, Troy, NY 12180 USA
[2] Univ London Royal Holloway & Bedford New Coll, Dept Comp Sci, Egham TW20 0EX, Surrey, England
基金
美国国家科学基金会;
关键词
ensemble learning; boosting; linear programming; sparseness; soft margin;
D O I
10.1023/A:1012470815092
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.
引用
收藏
页码:225 / 254
页数:30
相关论文
共 26 条
[1]  
[Anonymous], 2000, ICML
[2]  
Anthony Martin M, 1999, LEARNING NEURAL NETW
[3]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[4]  
Bennett KP, 1999, ADV NEUR IN, V11, P368
[5]  
Bennett KP, 1999, ADVANCES IN KERNEL METHODS, P307
[6]  
BENNETT KP, 1992, OPTIMIZATION METHODS, V1, P23, DOI DOI 10.1080/10556789208805504
[7]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[8]  
Bredensteiner E.J., 2000, ICML, P57
[9]   Prediction games and arcing algorithms [J].
Breiman, L .
NEURAL COMPUTATION, 1999, 11 (07) :1493-1517
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411