Optimality Driven Nearest Centroid Classification from Genomic Data

被引:26
作者
Dabney, Alan R. [1 ]
Storey, John D. [2 ,3 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
来源
PLOS ONE | 2007年 / 2卷 / 10期
关键词
D O I
10.1371/journal.pone.0001002
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.
引用
收藏
页数:7
相关论文
共 21 条
[11]   Regularized linear discriminant analysis and its application in microarrays [J].
Guo, Yaqian ;
Hastie, Trevor ;
Tibshirani, Robert .
BIOSTATISTICS, 2007, 8 (01) :86-100
[12]   Gene-expression profiles in hereditary breast cancer. [J].
Hedenfalk, I ;
Duggan, D ;
Chen, YD ;
Radmacher, M ;
Bittner, M ;
Simon, R ;
Meltzer, P ;
Gusterson, B ;
Esteller, M ;
Kallioniemi, OP ;
Wilfond, B ;
Borg, Å ;
Trent, J ;
Raffeld, M ;
Yakhini, Z ;
Ben-Dor, A ;
Dougherty, E ;
Kononen, J ;
Bubendorf, L ;
Fehrle, W ;
Pittaluga, S ;
Gruvberger, S ;
Loman, N ;
Johannsoson, O ;
Olsson, H ;
Sauter, G .
NEW ENGLAND JOURNAL OF MEDICINE, 2001, 344 (08) :539-548
[13]   Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks [J].
Khan, J ;
Wei, JS ;
Ringnér, M ;
Saal, LH ;
Ladanyi, M ;
Westermann, F ;
Berthold, F ;
Schwab, M ;
Antonescu, CR ;
Peterson, C ;
Meltzer, PS .
NATURE MEDICINE, 2001, 7 (06) :673-679
[14]   An extensive comparison of recent classification tools applied to microarray data [J].
Lee, JW ;
Lee, JB ;
Park, M ;
Song, SH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (04) :869-885
[15]   VARIABLE SELECTION TECHNIQUES IN DISCRIMINANT-ANALYSIS .1. DESCRIPTION [J].
MCKAY, RJ ;
CAMPBELL, NA .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1982, 35 (MAY) :1-29
[16]   VARIABLE SELECTION TECHNIQUES IN DISCRIMINANT-ANALYSIS .2. ALLOCATION [J].
MCKAY, RJ ;
CAMPBELL, NA .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1982, 35 (MAY) :30-41
[17]   Systematic variation in gene expression patterns in human cancer cell lines [J].
Ross, DT ;
Scherf, U ;
Eisen, MB ;
Perou, CM ;
Rees, C ;
Spellman, P ;
Iyer, V ;
Jeffrey, SS ;
Van de Rijn, M ;
Waltham, M ;
Pergamenschikov, A ;
Lee, JCE ;
Lashkari, D ;
Shalon, D ;
Myers, TG ;
Weinstein, JN ;
Botstein, D ;
Brown, PO .
NATURE GENETICS, 2000, 24 (03) :227-235
[18]   A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics [J].
Schäfer, J ;
Strimmer, K .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2005, 4 :1-30
[19]   Eigengene-based linear discriminant model for tumor classification using gene expression microarray data [J].
Shen, Ronglai ;
Ghosh, Debashis ;
Chinnaiyan, Arul ;
Meng, Zhaoling .
BIOINFORMATICS, 2006, 22 (21) :2635-2642
[20]  
Stein C., 1956, Proceedings of the Third Berkeley symposium on mathematical statistics and probability, V1, P197, DOI DOI 10.1525/9780520313880-018