On Consistency and Sparsity for Principal Components Analysis in High Dimensions

被引:520
作者
Johnstone, Iain M. [1 ]
Lu, Arthur Yu [2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Renaissance Technol LLC, E Setauket, NY 11733 USA
基金
美国国家科学基金会;
关键词
Eigenvector estimation; Reduction of dimension; Regularization; Thresholding; Variable selection; FEATURE-SELECTION; PCA;
D O I
10.1198/jasa.2009.0121
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading, principal component vector via standard PCA is consistent if and only if p(n)/n -> 0. We provide a simple algorithm for selecting it subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) >> n.
引用
收藏
页码:682 / 693
页数:12
相关论文
共 37 条
[1]  
Amini ArashA., 2009, Annals of Statistics
[2]   ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS [J].
ANDERSON, TW .
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (01) :122-&
[3]  
[Anonymous], 1993, Large deviations techniques and applications
[4]  
[Anonymous], 1999, A Wavelet Tour of Signal Processing
[5]  
[Anonymous], 2002, THESIS STANFORD U
[6]   STATISTICAL-MECHANICS OF UNSUPERVISED STRUCTURE RECOGNITION [J].
BIEHL, M ;
MIETZNER, A .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1994, 27 (06) :1885-1897
[7]  
Cherkassky V, 2007, LEARNING DATA CONCEP
[8]   A direct formulation for sparse PCA using semidefinite programming [J].
d'Aspremont, Alexandre ;
El Ghaoui, Laurent ;
Jordan, Michael I. ;
Lanckriet, Gert R. G. .
SIAM REVIEW, 2007, 49 (03) :434-448
[9]  
Donoho D. L., 1993, Applied and Computational Harmonic Analysis, V1, P100, DOI 10.1006/acha.1993.1008
[10]  
DONOHO DL, 1995, J ROY STAT SOC B MET, V57, P301