Statistical performance of support vector machines

被引:89
作者
Blanchard, Gilles [1 ]
Bousquet, Olivier [2 ]
Massart, Pascal [3 ]
机构
[1] Fraunhofer Inst FIRST, D-12489 Berlin, Germany
[2] Google, CH-8002 Zurich, Switzerland
[3] Univ Paris 11, Math Lab, F-91405 Orsay, France
关键词
classification; support vector machine; model selection; oracle inequality;
D O I
10.1214/009053607000000839
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to Study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain "oracle inequalities" in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.
引用
收藏
页码:489 / 531
页数:43
相关论文
共 41 条
[1]   Scale-sensitive dimensions, uniform convergence, and learnability [J].
Alon, N ;
BenDavid, S ;
CesaBianchi, N ;
Haussler, D .
JOURNAL OF THE ACM, 1997, 44 (04) :615-631
[2]  
Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[3]   Empirical minimization [J].
Bartlett, PL ;
Mendelson, S .
PROBABILITY THEORY AND RELATED FIELDS, 2006, 135 (03) :311-334
[4]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[5]   Local Rademacher complexities [J].
Bartlett, PL ;
Bousquet, O ;
Mendelson, S .
ANNALS OF STATISTICS, 2005, 33 (04) :1497-1537
[6]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[7]  
BIRMAN MS, 1967, MATH USSR SB, V2, P295, DOI [10.1070/SM1967v002n03ABEH002343, DOI 10.1070/SM1967V002N03ABEH002343]
[8]   Statistical properties of kernel principal component analysis [J].
Blanchard, Gilles ;
Bousquet, Olivier ;
Zwald, Laurent .
MACHINE LEARNING, 2007, 66 (2-3) :259-294
[9]  
BOUSQUET A, 2002, C R ACAD SCI PARIS S, V1334, P495
[10]   Stability and generalization [J].
Bousquet, O ;
Elisseeff, A .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) :499-526