On the generalization of soft margin algorithms

被引:44
作者
Shawe-Taylor, J [1 ]
Cristianini, N [1 ]
机构
[1] Univ London Royal Holloway & Bedford New Coll, Dept Comp Sci, Egham TW20 0EX, Surrey, England
基金
英国工程与自然科学研究理事会;
关键词
generalization; margin; margin distribution; neural networks; probably approximately correct (pac) learning; ridge regression; soft margin; statistical learning; support vector machines (SVMs);
D O I
10.1109/TIT.2002.802647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generalization bounds depending on the margin of a classifier are a relatively recent development. They provide an ex. planation of the performance of state-of-the-art learning systems such as support vector machines (SVMs) [1] and Adaboost [2]. The difficulty with these bounds has been either their lack of robustness or their looseness. The question of whether the generalization of a classifier can be more tightly bounded in terms of a robust measure of the distribution of margin values has remained open for some time. The paper answers this open question in the affirmative and, furthermore, the analysis leads to bounds that motivate the previously heuristic soft margin SVM algorithms as well as justifying the use of the quadratic loss in neural network training algorithms. The results are extended to give bounds for the probability of failing to achieve a target accuracy in regression prediction, with a statistical analysis of ridge regression and Gaussian processes as a,special case. The analysis presented in the paper has also lead to,new boosting algorithms described elsewhere.
引用
收藏
页码:2721 / 2735
页数:15
相关论文
共 33 条
[1]   Scale-sensitive dimensions, uniform convergence, and learnability [J].
Alon, N ;
BenDavid, S ;
CesaBianchi, N ;
Haussler, D .
JOURNAL OF THE ACM, 1997, 44 (04) :615-631
[2]  
[Anonymous], DOKL AKAD NAUK SSSR
[3]   The hardness of approximate optima in lattices, codes, and systems of linear equations [J].
Arora, S ;
Babai, L ;
Stern, J ;
Sweedyk, Z .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 54 (02) :317-331
[4]  
Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43
[5]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[6]   Enlarging the margins in perceptron decision trees [J].
Bennett, KP ;
Cristianini, N ;
Shawe-Taylor, J ;
Wu, DH .
MACHINE LEARNING, 2000, 41 (03) :295-313
[7]  
BENNETT KP, 2001, IN PRESS MACH LEARNI
[8]  
BENNETT KP, 2000, MACH LEARN P 17 INT
[9]  
BENNETT KP, 1992, OPTIMIZATION METHODS, V1, P23, DOI DOI 10.1080/10556789208805504
[10]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401