Study on the Impact of Partition-Induced Dataset Shift on k-fold Cross-Validation

被引:272
作者
Garcia Moreno-Torres, Jose [1 ]
Saez, Jose A. [1 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18001 Granada, Spain
关键词
Covariate shift; cross-validation; dataset shift; partitioning; COVARIATE SHIFT; ACCURACY;
D O I
10.1109/TNNLS.2012.2199516
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Cross-validation is a very commonly employed technique used to evaluate classifier performance. However, it can potentially introduce dataset shift, a harmful factor that is often not taken into account and can result in inaccurate performance estimation. This paper analyzes the prevalence and impact of partition-induced covariate shift on different k-fold cross-validation schemes. From the experimental results obtained, we conclude that the degree of partition-induced covariate shift depends on the cross-validation scheme considered. In this way, worse schemes may harm the correctness of a single-classifier performance estimation and also increase the needed number of repetitions of cross-validation to reach a stable performance estimation.
引用
收藏
页码:1304 / 1312
页数:9
相关论文
共 27 条
[1]
KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[2]
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]
[Anonymous], 2014, C4. 5: programs for machine learning
[4]
Candela J., 2009, Dataset shift in machine learning
[5]
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability [J].
Cano, Jose Ramon ;
Herrera, Francisco ;
Lozano, Manuel .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :90-108
[6]
Support vector learning for fuzzy rule-based classification systems [J].
Chen, YX ;
Wang, JZ .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2003, 11 (06) :716-728
[7]
Cohen W., 1995, P 12 INT C MACH LEAR, P1
[8]
Cohen W. W., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P115
[9]
NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]
A LEISURELY LOOK AT THE BOOTSTRAP, THE JACKKNIFE, AND CROSS-VALIDATION [J].
EFRON, B ;
GONG, G .
AMERICAN STATISTICIAN, 1983, 37 (01) :36-48