Small-sample precision of ROC-related estimates

被引：238

作者：

Hanczar, Blaise ^{[2
]}

Hua, Jianping ^{[1
]}

Sima, Chao ^{[1
]}

Weinstein, John ^{[3
]}

Bittner, Michael ^{[1
]}

Dougherty, Edward R. ^{[1
,3
,4
]}

机构：

[1] Translat Genom Res Inst, Computat Biol Div, Phoenix, AZ USA

[2] Univ Paris 05, LIPADE, Paris, France

[3] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77030 USA

[4] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX USA

来源：

BIOINFORMATICS | 2010年 / 26卷 / 06期

基金：

美国国家科学基金会;

关键词：

GENE-EXPRESSION SIGNATURE; CLASSIFICATION; PERFORMANCE;

D O I：

10.1093/bioinformatics/btq037

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of crossvalidation and bootstrap. Using resampling, we show the unreliability of some published ROC results.

引用

页码：822 / 830

页数：9

共 17 条

[1] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].