OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS IN HIGH DIMENSION

被引:151
作者
Berthet, Quentin [1 ]
Rigollet, Philippe [1 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
High-dimensional detection; sparse principal component analysis; spiked covariance model; semidefinite relaxation; minimax lower bounds; planted clique; LARGE HIDDEN CLIQUE; LARGEST EIGENVALUE; NORM; SIZE; PCA; SUBMATRICES; RELAXATIONS; CONSISTENCY; MATRICES; NOISY;
D O I
10.1214/13-AOS1127
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
引用
收藏
页码:1780 / 1815
页数:36
相关论文
共 76 条
[1]   ON COMBINATORIAL TESTING PROBLEMS [J].
Addario-Berry, Louigi ;
Broutin, Nicolas ;
Devroye, Luc ;
Lugosi, Gabor .
ANNALS OF STATISTICS, 2010, 38 (05) :3063-3092
[2]  
Alon N, 1998, RANDOM STRUCT ALGOR, V13, P457, DOI 10.1002/(SICI)1098-2418(199810/12)13:3/4<457::AID-RSA14>3.0.CO
[3]  
2-W
[4]  
Alon N., 2011, INAPPROXIMABIL UNPUB
[5]   Testing k-wise and Almost k-wise Independence [J].
Alon, Noga ;
Andoni, Alexandr ;
Kaufman, Tali ;
Matulef, Kevin ;
Rubinfeld, Ronitt ;
Xie, Ning .
STOC 07: PROCEEDINGS OF THE 39TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, 2007, :496-505
[6]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[7]   Nuclear norm minimization for the planted clique and biclique problems [J].
Ames, Brendan P. W. ;
Vavasis, Stephen A. .
MATHEMATICAL PROGRAMMING, 2011, 129 (01) :69-89
[8]   HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS [J].
Amini, Arash A. ;
Wainwright, Martin J. .
ANNALS OF STATISTICS, 2009, 37 (5B) :2877-2921
[9]  
[Anonymous], 2010, P 13 INT C ART INT S
[10]  
[Anonymous], 2003, INTRO LECT CONVEX OP