Classifier variability: Accounting for training and testing

被引:16
作者
Chen, Weijie [1 ]
Gallas, Brandon D. [1 ]
Yousef, Waleed A. [2 ]
机构
[1] US FDA, Off Sci & Engn Labs, Ctr Devices & Radiol Hlth, Silver Spring, MD 20993 USA
[2] Helwan Univ, Fac Comp & Informat, Human Comp Interact Lab, Cairo, Egypt
关键词
Classifier evaluation; Training variability; Classifier stability; U-statistics; AUC; ROC ANALYSIS; VARIANCE; VALIDATION; BOOTSTRAP; AREA; PREDICTION;
D O I
10.1016/j.patcog.2011.12.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We categorize the statistical assessment of classifiers into three levels: assessing the classification performance and its testing variability conditional on a fixed training set, assessing the performance and its variability that accounts for both training and testing, and assessing the performance averaging over training sets and its variability that accounts for both training and testing. We derived analytical expressions for the variance of the estimated AUC and provide freely available software implemented with an efficient computation algorithm. Our approach can be applied to assess any classifier that has ordinal (continuous or discrete) outputs. Applications to simulated and real datasets are presented to illustrate our methods. Published by Elsevier Ltd.
引用
收藏
页码:2661 / 2671
页数:11
相关论文
共 38 条
[1]  
[Anonymous], 2002, Duxbury Advanced Series
[2]  
[Anonymous], 2003, The Statistical Evaluation of Medical Tests for Classification and Prediction
[3]  
[Anonymous], PATTERN RECOGN LETT
[4]  
[Anonymous], 2002, NETLAB: Algorithms for Pattern rRcognition
[5]  
[Anonymous], 2007, Uci machine learning repository
[6]   AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH [J].
BAMBER, D .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) :387-415
[7]   Probabilistic foundations of the MRMC method [J].
Barrett, HH ;
Kupinski, MA ;
Clarkson, E .
Medical Imaging 2005: Image Perception, Observer Performance, and Technology Assessment, 2005, 5749 :21-31
[8]   Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects, receiver operating characteristic analysis [J].
Beiden, SV ;
Wagner, RF ;
Campbell, G .
ACADEMIC RADIOLOGY, 2000, 7 (05) :341-349
[9]   A general model for finite-sample effects in training and testing of competing classifiers [J].
Beiden, SV ;
Maloof, MA ;
Wagner, RF .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (12) :1561-1569
[10]  
Bengio Y, 2004, J MACH LEARN RES, V5, P1089