Estimating misclassification error with small samples via bootstrap cross-validation

被引:110
作者
Fu, WJJ [1 ]
Carroll, RJ [1 ]
Wang, SJ [1 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
关键词
D O I
10.1093/bioinformatics/bti294
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). While small sample size remains one of the key features of costly clinical investigations or of microarray studies that have limited resources in funding, time and tissue materials, accurate and easy-to-implement error estimation methods for small samples are desirable and will be beneficial. Results: A bootstrap cross-validation method is studied. It achieves accurate error estimation through a simple procedure with bootstrap resampling and only costs computer CPU time. Simulation studies and applications to microarray data demonstrate that it performs consistently better than its competitors. This method possesses several attractive properties: (1) it is implemented through a simple procedure; (2) it performs well for small samples with sample size, as small as 16; (3) it is not restricted to any particular classification rules and thus applies to many parametric or non-parametric methods.
引用
收藏
页码:1979 / 1986
页数:8
相关论文
共 24 条
[1]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[2]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Corrected small-sample estimation of the Bayes error [J].
Brun, M ;
Sabbagh, D ;
Kim, S ;
Dougherty, ER .
BIOINFORMATICS, 2003, 19 (08) :944-951
[5]  
Bühlmann P, 2002, ANN STAT, V30, P927
[6]  
BUJA A, 2000, SMOOTHING EFFECTS BA, P1
[7]  
BUJA A, 2000, EFFECT BAGGING VARIA, P1
[8]  
Chen SX, 2003, STAT SINICA, V13, P97
[9]   Small sample issues for microarray-based classification [J].
Dougherty, ER .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (01) :28-34
[10]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87