Suboptimality of penalized empirical risk minimization in classification

被引:9
作者
Lecue, Guillaume [1 ]
机构
[1] Univ Paris 06, Lab Probabil & Modeles Aleatoires, CNRS, UMR 7599, F-75252 Paris, France
来源
LEARNING THEORY, PROCEEDINGS | 2007年 / 4539卷
关键词
D O I
10.1007/978-3-540-72927-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let F be a set of M classification procedures with values in [-1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n)(1/2) or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.
引用
收藏
页码:142 / 156
页数:15
相关论文
共 35 条
[11]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[12]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[13]  
HARTIGAN JA, 2002, BAYESIAN REGRESSION
[14]   Sequential prediction of individual sequences under general loss functions [J].
Haussler, D ;
Kivinen, J ;
Warmuth, MK .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (05) :1906-1925
[15]  
JUDITSKY A, LEARNING MIRROR AVER
[16]  
JUDITSKY A, PROBLEMS INFORM TRAN, V41, P368
[17]  
Kivinen J, 1999, LECT NOTES ARTIF INT, V1572, P153
[18]  
KOLTCHINSKII V, 2006, ANN STAT, V34, P1
[19]  
LECUE G, 2005, IN PRESS ANN STAT
[20]  
LECUE G, 2006, UNPUB SUBOPTIMALITY