Structural risk minimization over data-dependent hierarchies

被引:289
作者
Shawe-Taylor, J [1 ]
Bartlett, PL
Williamson, RC
Anthony, M
机构
[1] Univ London Royal Holloway & Bedford New Coll, Dept Comp Sci, Egham TW20 0EX, Surrey, England
[2] Australian Natl Univ, Dept Syst Engn, Canberra, ACT 0200, Australia
[3] Australian Natl Univ, Dept Engn, Canberra, ACT 0200, Australia
[4] London Sch Econ, Dept Math, London WC2A 2AE, England
基金
澳大利亚研究理事会;
关键词
computational learning theory; fat shattering; dimension; learning machines; maximal margin; probable smooth luckiness; probably approximately correct learning; support vector machines; uniform convergence; Vapnik-Chervonenkis dimension;
D O I
10.1109/18.705570
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper introduces some generalizations of Vapnik's method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a "large margin." This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (VC) dimension measured on the sample.
引用
收藏
页码:1926 / 1940
页数:15
相关论文
共 52 条
[1]   Scale-sensitive dimensions, uniform convergence, and learnability [J].
Alon, N ;
BenDavid, S ;
CesaBianchi, N ;
Haussler, D .
JOURNAL OF THE ACM, 1997, 44 (04) :615-631
[2]  
ANONY M, 1997, DISCRETE APPL MATH, V77, P1
[3]  
[Anonymous], 1992, ADV NEUR IN
[4]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[5]   A RESULT OF VAPNIK WITH APPLICATIONS [J].
ANTHONY, M ;
SHAWETAYLOR, J .
DISCRETE APPLIED MATHEMATICS, 1993, 47 (03) :207-217
[6]  
Anthony M., 1990, Proceedings of the Third Annual Workshop on Computational Learning Theory, P246
[7]  
ANTHONY M, 1995, LECT NOTES ARTIF INT, V904, P211
[8]   APPROXIMATION AND ESTIMATION BOUNDS FOR ARTIFICIAL NEURAL NETWORKS [J].
BARRON, AR .
MACHINE LEARNING, 1994, 14 (01) :115-133
[9]   MINIMUM COMPLEXITY DENSITY-ESTIMATION [J].
BARRON, AR ;
COVER, TM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (04) :1034-1054
[10]  
BARRON AR, 1991, NATO ADV SCI I C-MAT, V335, P561