Are loss functions all the same?

被引：293

作者：

Rosasco, L ^{[1
]}

De Vito, E

Caponnetto, A

Piana, M

Verri, A

机构：

[1] Univ Genoa, DISI, INFM, I-16146 Genoa, Italy

[2] Univ Modena, Dipartimento Matemat, I-41100 Modena, Italy

[3] Ist Nazl Fis Nucl, Sez Genova, I-16146 Genoa, Italy

来源：

NEURAL COMPUTATION | 2004年 / 16卷 / 05期

关键词：

D O I：

10.1162/089976604773135104

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.

引用

页码：1063 / 1076

页数：14

共 18 条

[1] Alon N., 1993, Proceedings. 34th Annual Symposium on Foundations of Computer Science (Cat. No.93CH3368-8), P292, DOI 10.1109/SFCS.1993.366858
[2] THEORY OF REPRODUCING KERNELS
ARONSZAJN, N
[J]. TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 68 (MAY) : 337 - 404
[3] Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[4] Best choices for regularization parameters in learning theory: On the bias-variance problem
Cucker, F
Smale, S
[J]. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2002, 2 (04) : 413 - 428
[5] Cucker F, 2002, B AM MATH SOC, V39, P1
[6] Regularization networks and support vector machines
Evgeniou, T
Pontil, M
Poggio, T
[J]. ADVANCES IN COMPUTATIONAL MATHEMATICS, 2000, 13 (01) : 1 - 50
[7] Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[8] REGULARIZATION THEORY AND NEURAL NETWORKS ARCHITECTURES
GIROSI, F
JONES, M
POGGIO, T
[J]. NEURAL COMPUTATION, 1995, 7 (02) : 219 - 269
[9] LIN G, 2003, MACH LEARN, V48, P115
[10] LUGOSI G, IN PRESS ANN STAT

← 1 2 →