Binarized Support Vector Machines

被引:29
作者
Carrizosa, Emilio [1 ]
Martin-Barragan, Belen [2 ]
Morales, Dolores Romero [3 ]
机构
[1] Univ Seville, Dept Estadist & Invest Operat, E-41012 Seville, Spain
[2] Univ Carlos III Madrid, Dept Estadist, Madrid 28903, Spain
[3] Univ Oxford, Said Business Sch, Oxford OX1 1HP, England
关键词
supervised classification; binarization; column generation; support vector machines; RULE EXTRACTION; CLASSIFICATION; SELECTION;
D O I
10.1287/ijoc.1090.0317
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The widely used support vector machine (SVM) method has shown to yield very good results in supervised classification problems. Other methods such as classification trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in data mining. In this work, we propose an SVM-based method that automatically detects the most important predictor variables and the role they play in the classifier. In particular, the proposed method is able to detect those values and intervals that are critical for the classification. The method involves the optimization of a linear programming problem in the spirit of the Lasso method with a large number of decision variables. The numerical experience reported shows that a rather direct use of the standard column generation strategy leads to a classification method that, in terms of classification ability, is competitive against the standard linear SVM and classification trees. Moreover, the proposed method is robust; i.e., it is stable in the presence of outliers and invariant to change of scale or measurement units of the predictor variables. When the complexity of the classifier is an important issue, a wrapper feature selection method is applied, yielding simpler but still competitive classifiers.
引用
收藏
页码:154 / 167
页数:14
相关论文
共 36 条
[11]  
Carrizosa E, 2008, CRM PROC & LECT NOTE, V45, P1
[12]   Multi-group support vector machines with measurement costs: A biobjective approach [J].
Carrizosa, Emilio ;
Martin-Barragan, Belen ;
Morales, Dolores Romero .
DISCRETE APPLIED MATHEMATICS, 2008, 156 (06) :950-966
[13]  
Colas F, 2007, LECT NOTES COMPUT SC, V4723, P296
[14]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[15]   Using neural networks for data mining [J].
Craven, MW ;
Shavlik, JW .
FUTURE GENERATION COMPUTER SYSTEMS, 1997, 13 (2-3) :211-229
[16]  
David Hand H.M. P. S., 2001, Principles of Data Mining
[17]   Linear programming boosting via column generation [J].
Demiriz, A ;
Bennett, KP ;
Shawe-Taylor, J .
MACHINE LEARNING, 2002, 46 (1-3) :225-254
[18]   A feature selection Newton method for support vector machine classification [J].
Fung, GM ;
Mangasarian, OL .
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2004, 28 (02) :185-202
[19]   A LINEAR-PROGRAMMING APPROACH TO THE CUTTING-STOCK PROBLEM [J].
GILMORE, PC ;
GOMORY, RE .
OPERATIONS RESEARCH, 1961, 9 (06) :849-859
[20]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422