Suboptimality of penalized empirical risk minimization in classification

被引:9
作者
Lecue, Guillaume [1 ]
机构
[1] Univ Paris 06, Lab Probabil & Modeles Aleatoires, CNRS, UMR 7599, F-75252 Paris, France
来源
LEARNING THEORY, PROCEEDINGS | 2007年 / 4539卷
关键词
D O I
10.1007/978-3-540-72927-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let F be a set of M classification procedures with values in [-1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n)(1/2) or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.
引用
收藏
页码:142 / 156
页数:15
相关论文
共 35 条
[1]  
[Anonymous], 2001, Mathematical Statistics: Basic Ideas and Selected Topics
[2]   A randomized online learning algorithm for better variance control [J].
Audibert, Jean-Yves .
LEARNING THEORY, PROCEEDINGS, 2006, 4005 :392-407
[3]  
Barron A, 1997, BIOMETRICS, V53, P603
[4]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[5]  
Bühlmann P, 2002, ANN STAT, V30, P927
[6]  
CATONI O, 2008, LECT NOTES MATH
[7]  
Cesa-Bianchi N, 2006, PREDICTION LEARNING
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]  
Devroye L., 1996, A probabilistic theory of pattern recognition
[10]  
Einmahl U, 1996, ANN PROBAB, V24, P1388