HIGH-DIMENSIONAL CLASSIFICATION USING FEATURES ANNEALED INDEPENDENCE RULES

被引:365
作者
Fan, Jianqing [1 ]
Fan, Yingying [2 ,3 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] Univ So Calif, Informat & Operat Management Dept, Marshall Sch Business, Los Angeles, CA 90089 USA
[3] Harvard Univ, Cambridge, MA 02138 USA
关键词
Classification; feature extraction; high dimensionality; independence rule; misclassification rates;
D O I
10.1214/07-AOS504
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989-1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
引用
收藏
页码:2605 / 2637
页数:33
相关论文
共 30 条
[1]   Effective dimension reduction methods for tumor classification using gene expression data [J].
Antoniadis, A ;
Lambert-Lacroix, S ;
Leblanc, F .
BIOINFORMATICS, 2003, 19 (05) :563-570
[2]  
Bai ZD, 1996, STAT SINICA, V6, P311
[3]   Prediction by supervised principal components [J].
Bair, E ;
Hastie, T ;
Paul, D ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :119-137
[4]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[5]  
Boulesteix A.L., 2004, STAT APPL GENET MOL, V3, P33, DOI [10.2202/1544-6115.1075, DOI 10.2202/1544-6115.1075]
[6]   Graphical methods for class prediction using dimension reduction techniques on DNA microarray data [J].
Bura, E ;
Pfeiffer, RM .
BIOINFORMATICS, 2003, 19 (10) :1252-1258
[7]  
CAO H, 2007, ESAIM-PROBAB STAT, V11, P264
[8]   Dimension reduction strategies for analyzing global gene expression data with a response [J].
Chiaromonte, F ;
Martinelli, J .
MATHEMATICAL BIOSCIENCES, 2002, 176 (01) :123-144
[9]   Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069
[10]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87