A general model for finite-sample effects in training and testing of competing classifiers

被引:26
作者
Beiden, SV [1 ]
Maloof, MA
Wagner, RF
机构
[1] US FDA, Ctr Devices & Radiol Hlth, Rockville, MD 20857 USA
[2] Georgetown Univ, Dept Comp Sci, Washington, DC 20057 USA
关键词
pattern recognition; classifier design and evaluation; discriminant analysis; ROC analysis; components-of-variance models; bootstrap methods; STATISTICAL PATTERN-RECOGNITION; OF-VARIANCE MODELS; ROC ANALYSIS; CROSS-VALIDATION; COMPONENTS; PERFORMANCE; BOOTSTRAP; DESIGN; SIZE; INDEX;
D O I
10.1109/TPAMI.2003.1251149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The conventional wisdom in the field of statistical pattern recognition (SPR) is that the size of the finite test sample dominates the variance in the assessment of the performance of a classical or neural classifier. The present work shows that this result has only narrow applicability. In particular, when competing algorithms are compared, the finite training sample more commonly dominates this uncertainty. This general problem in SPR is analyzed using a formal structure recently developed for multivariate random-effects receiver operating characteristic (ROC) analysis. Monte Carlo trials within the general model are used to explore the detailed statistical structure of several representative problems in the subfield of computer-aided diagnosis in medicine. The scaling laws between variance of accuracy measures and number of training samples and number of test samples are investigated and found to be comparable to those discussed in the classic text of Fukunaga, but important interaction terms have been neglected by previous authors. Finally, the importance of the contribution of finite trainers to the uncertainties argues for some form of bootstrap analysis to sample that uncertainty. The leading contemporary candidate is an extension of the 0.632 bootstrap and associated error analysis, as opposed to the more commonly used cross-validation.
引用
收藏
页码:1561 / 1569
页数:9
相关论文
共 29 条
[1]   Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects, receiver operating characteristic analysis [J].
Beiden, SV ;
Wagner, RF ;
Campbell, G .
ACADEMIC RADIOLOGY, 2000, 7 (05) :341-349
[2]   Components-of-variance models for random-effects ROC analysis: The case of unequal variance structures across modalities [J].
Beiden, SV ;
Wagner, RF ;
Campbell, G ;
Metz, CE ;
Jiang, YL .
ACADEMIC RADIOLOGY, 2001, 8 (07) :605-615
[3]   Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis [J].
Beiden, SV ;
Wagner, RF ;
Campbell, G ;
Chan, HP .
ACADEMIC RADIOLOGY, 2001, 8 (07) :616-622
[4]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[5]   Classifier design for computer-aided diagnosis: Effects of finite sample size on the mean performance of classical and neural network classifiers [J].
Chan, HP ;
Sahiner, B ;
Wagner, RF ;
Petrick, N .
MEDICAL PHYSICS, 1999, 26 (12) :2654-2668
[6]   National Cancer Institute Initiative: Lung image database resource for imaging research [J].
Clarke, LP ;
Croft, BY ;
Staab, E ;
Baker, H ;
Sullivan, DC .
ACADEMIC RADIOLOGY, 2001, 8 (05) :447-450
[7]  
DODD LE, 2003, UNPUB ACAD RADIOLOGY
[8]   RECEIVER OPERATING CHARACTERISTIC RATING ANALYSIS - GENERALIZATION TO THE POPULATION OF READERS AND PATIENTS WITH THE JACKKNIFE METHOD [J].
DORFMAN, DD ;
BERBAUM, KS ;
METZ, CE .
INVESTIGATIVE RADIOLOGY, 1992, 27 (09) :723-731
[9]   Improvements on cross-validation: The .632+ bootstrap method [J].
Efron, B ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560