In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines

被引:87
作者
Anguita, Davide [1 ]
Ghio, Alessandro [1 ]
Oneto, Luca [1 ]
Ridella, Sandro [1 ]
机构
[1] Univ Genoa, DITEN, I-16145 Genoa, Italy
关键词
Bootstrap; cross validation; error estimation; leave one out; model selection; statistical learning theory (SLT); structural risk minimization (SRM); support vector machine (SVM); PROBABILITY-INEQUALITIES; CROSS-VALIDATION; NEURAL-NETWORKS; CANCER; CLASSIFICATION; COMPLEXITY; DISCOVERY;
D O I
10.1109/TNNLS.2012.2202401
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
In-sample approaches to model selection and error estimation of support vector machines (SVMs) are not as widespread as out-of-sample methods, where part of the data is removed from the training set for validation and testing purposes, mainly because their practical application is not straightforward and the latter provide, in many cases, satisfactory results. In this paper, we survey some recent and not-so-recent results of the data-dependent structural risk minimization framework and propose a proper reformulation of the SVM learning algorithm, so that the in-sample approach can be effectively applied. The experiments, performed both on simulated and real-world datasets, show that our in-sample approach can be favorably compared to out-of-sample methods, especially in cases where the latter ones provide questionable results. In particular, when the number of samples is small compared to their dimensionality, like in classification of microarray data, our proposal can outperform conventional out-of-sample approaches such as the cross validation, the leave-one-out, or the Bootstrap methods.
引用
收藏
页码:1390 / 1406
页数:17
相关论文
共 67 条
[1]
HINTS [J].
ABUMOSTAFA, YS .
NEURAL COMPUTATION, 1995, 7 (04) :639-671
[2]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]
On the statistical assessment of classifiers using DNA microarray data [J].
Ancona, N. ;
Maglietta, R. ;
Piepoli, A. ;
D'Addabbo, A. ;
Cotugno, R. ;
Savino, M. ;
Liuni, S. ;
Carella, M. ;
Pesole, G. ;
Perri, F. .
BMC BIOINFORMATICS, 2006, 7 (1)
[4]
Anguita D, 2005, STUD FUZZ SOFT COMP, V177, P159
[5]
Anguita D, 2005, IEEE IJCNN, P855
[6]
Anguita D., 2011, Proceedings 2011 IEEE Symposium on Foundations of Computational Intelligence (FOCI 2011), P80, DOI 10.1109/FOCI.2011.5949469
[7]
Anguita D., 2010, The 2010 International Joint Conference on Neural Networks, P1, DOI DOI 10.1109/IJCNN.2010.5596450
[8]
Anguita D., 2011, P NEUR INF PROC SYST, P1009
[9]
Anguita D, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P1154, DOI 10.1109/IJCNN.2011.6033354
[10]
Anguita D, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P1169, DOI 10.1109/IJCNN.2011.6033356