Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

被引:158
作者
LeDell, Erin [1 ]
Petersen, Maya [1 ]
van der Laan, Mark [1 ]
机构
[1] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2015年 / 9卷 / 01期
基金
美国国家卫生研究院;
关键词
AUC; binary classification; confidence intervals; cross-validation; influence curve; influence function; machine learning; model selection; ROC; variance estimation; SELECTION;
D O I
10.1214/15-EJS1035
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the boot-strap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
引用
收藏
页码:1583 / 1607
页数:25
相关论文
共 23 条
[1]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[2]  
Bezanson Je., 2012, CORR, Vabs/1209.5145
[3]  
Bickel P. J., 1993, J HOPKINS SERIES MAT
[4]  
Bickel PJ, 1997, STAT SINICA, V7, P1
[5]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[6]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26
[7]  
Efron B., 1993, MONOGRAPHS STAT APPL, V57
[8]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[9]   PREDICTIVE SAMPLE REUSE METHOD WITH APPLICATIONS [J].
GEISSER, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :320-328
[10]  
GILL RD, 1989, SCAND J STAT, V16, P97