Suboptimality of penalized empirical risk minimization in classification

被引：9

作者：

Lecue, Guillaume ^{[1
]}

机构：

[1] Univ Paris 06, Lab Probabil & Modeles Aleatoires, CNRS, UMR 7599, F-75252 Paris, France

来源：

LEARNING THEORY, PROCEEDINGS | 2007年 / 4539卷

关键词：

D O I：

10.1007/978-3-540-72927-3_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Let F be a set of M classification procedures with values in [-1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n)(1/2) or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.

引用

页码：142 / 156

页数：15

共 35 条

[1]

[Anonymous], 2001, Mathematical Statistics: Basic Ideas and Selected Topics

[2] A randomized online learning algorithm for better variance control [J].

Audibert, Jean-Yves .

LEARNING THEORY, PROCEEDINGS, 2006, 4005 :392-407

[3]

Barron A, 1997, BIOMETRICS, V53, P603

[4] Convexity, classification, and risk bounds [J].

Bartlett, PL ;

Jordan, MI ;

McAuliffe, JD .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156

[5]

Bühlmann P, 2002, ANN STAT, V30, P927

[6]

CATONI O, 2008, LECT NOTES MATH

[7]

Cesa-Bianchi N, 2006, PREDICTION LEARNING

[8]

CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411

[9]

Devroye L., 1996, A probabilistic theory of pattern recognition

[10]

Einmahl U, 1996, ANN PROBAB, V24, P1388

← 1 2 3 4 →