HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS

被引：114

作者：

Amini, Arash A. ^{[1
]}

Wainwright, Martin J. ^{[2
]}

机构：

[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

ANNALS OF STATISTICS | 2009年 / 37卷 / 5B期

关键词：

Principal component analysis; spectral analysis; spiked covariance ensembles; sparsity; high-dimensional statistics; convex relaxation; semidefinite programming; Wishart ensembles; random matrices; COVARIANCE; SELECTION; NORM;

D O I：

10.1214/08-AOS664

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Principal component analysis (PCA) is a classical method for dimensionality reduction based oil extracting the dominant eigenvectors of the Sample covariance matrix. However, PCA is well known to behave poorly inw the "large p, small n" setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PICA ill this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the Support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method. which transitions from success to failure as a function of the resealed sample size theta(dia)(n, p, k) = n/[k(2) log(p - k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the resealed sample Size theta(sdp)(n, p, k) = n/[k log(p - k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can Succeed in recovering the Support if the order parameter theta(sdp)(n, p, k) is below a threshold. Our results thus highlight,in interesting trade-oft between computational and statistical efficiency in high-dimensional inference.

引用

页码：2877 / 2921

页数：45

共 41 条

[1] Anderson T. W., 1984, An introduction to multivariate statistical analysis, V499
[2] [Anonymous], 2013, MATRIX ANAL
[3] [Anonymous], 2000, TABLES INTEGRALS SER
[4] [Anonymous], 1991, ELEMENTS INFORM THEO
[5] [Anonymous], 1996, Die Grundlehren der mathematischen Wissenschaften
[6] [Anonymous], 2004, PRINCIPAL COMPONENT
[7] Eigenvalues of large sample covariance matrices of spiked population models
Baik, Jinho
Silverstein, Jack W.
[J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (06) : 1382 - 1408
[8] Regularized estimation of large covariance matrices
Bickel, Peter J.
Levina, Elizaveta
[J]. ANNALS OF STATISTICS, 2008, 36 (01) : 199 - 227
[9] BIRGE L, 2001, IMS LECT NOTES, V37, P113
[10] Buldygin V., 2000, Transl. Math. Monogr.

← 1 2 3 4 5 →