On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

被引:30
作者
Zollanvari, Amin [1 ]
Braga-Neto, Ulisses M. [1 ]
Dougherty, Edward R. [1 ,2 ,3 ]
机构
[1] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
[2] Translat Genom Res Inst, Computat Biol Div, Phoenix, AZ USA
[3] Univ Texas MD Anderson Canc Ctr, Dept Pathol, Houston, TX 77030 USA
基金
美国国家科学基金会;
关键词
Error estimation; Parametric classification; Linear discriminant analysis; Sampling distribution; Resubstitution; Leave-one-out; CLINICAL BEHAVIOR; QUADRATIC-FORMS; OVARIAN-CANCER; EXPRESSION; CLASSIFICATION; MICROARRAY; PREDICTION;
D O I
10.1016/j.patcog.2009.05.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Error estimation is a problem of high current interest in many areas of application. This paper concerns the classical problem of determining the performance of error estimators in small-sample settings under a Gaussianity Parametric assumption. We provide here for the first time the exact sampling distribution of the resubstitution and leave-one-out error estimators for linear discriminant analysis (LDA) in the univariate case, which is valid for any sample size and combination of parameters (including unequal variances and sample sizes for each class). In the multivariate case, we provide a quasi-binomial approximation to the distribution of both the resubstitution and leave-one-out error estimators for LDA, under a common but otherwise arbitrary class covariance matrix, which is assumed to be known in the design of the LDA discriminant. We provide numerical examples, using both synthetic and real data, that indicate that these approximations are accurate. provided that LDA classification error is not too large. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2705 / 2723
页数:19
相关论文
共 41 条
[1]  
Anderson T.W., 1951, PSYCHOMETRIKA, V16, P31
[2]  
[Anonymous], 2013, A Probabilistic Theory of Pattern Recognition
[3]   Classifier performance as a function of distributional complexity [J].
Attoor, SN ;
Dougherty, ER .
PATTERN RECOGNITION, 2004, 37 (08) :1641-1651
[4]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[5]  
Bowker A., 1961, STUDIES ITEM ANAL PR, P285
[6]  
BUTLER K, 1993, ADA266969 STANF U DE
[7]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[8]   Predicting the clinical behavior of ovarian cancer from gene expression profiles [J].
De Smet, F ;
Pochet, NLMM ;
Engelen, K ;
Van Gorp, T ;
Van Hummelen, P ;
Marchal, K ;
Amant, F ;
Timmerman, D ;
De Moor, BLR ;
Vergote, IB .
INTERNATIONAL JOURNAL OF GYNECOLOGICAL CANCER, 2006, 16 :147-151
[9]   Validation of computational methods in genomics [J].
Dougherty, Edward R. ;
Hua, Jianping ;
Bittner, Michael L. .
CURRENT GENOMICS, 2007, 8 (01) :1-19
[10]   Epistemology of computational biology: Mathematical models and experimental prediction as the basis of their validity [J].
Dougherty, ER ;
Braga-Neto, U .
JOURNAL OF BIOLOGICAL SYSTEMS, 2006, 14 (01) :65-90