Large covariance estimation by thresholding principal orthogonal complements

被引：498

作者：

Fan, Jianqing ^{[1
]}

Liao, Yuan ^{[2
]}

Mincheva, Martina ^{[3
]}

机构：

[1] Princeton Univ, Princeton, NJ 08544 USA

[2] Univ Maryland, College Pk, MD 20742 USA

[3] Princeton Univ, Princeton, NJ 08544 USA

来源：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY | 2013年 / 75卷 / 04期

基金：

英国工程与自然科学研究理事会; 美国国家卫生研究院;

关键词：

Approximate factor model; Cross-sectional correlation; Diverging eigenvalues; High dimensionality; Low rank matrix; Principal components; Sparse matrix; Thresholding; Unknown factors; DYNAMIC-FACTOR MODEL; HIGH-DIMENSION; MATRIX DECOMPOSITION; PORTFOLIO SELECTION; COMPONENTS-ANALYSIS; LARGEST EIGENVALUE; FALSE DISCOVERY; OPTIMAL RATES; LARGE NUMBER; CONSISTENCY;

D O I：

10.1111/rssb.12016

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

引用

页码：603 / 680

页数：78

共 127 条

[81] Principal components analysis of nonstationary time series data
Lansangan, Joseph Ryan G.
Barrios, Erniel B.
[J]. STATISTICS AND COMPUTING, 2009, 19 (02) : 173 - 187
[82] Lawley D. N., 1971, Factor analysis as a statistical method
[83] A well-conditioned estimator for large-dimensional covariance matrices
Ledoit, O
Wolf, M
[J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (02) : 365 - 411
[84] Ledoit O., 2003, Journal of Empirical Finance, V10, P603, DOI DOI 10.1016/S0927-5398(03)00007-0
[85] A general framework for multiple testing dependence
Leek, Jeffrey T.
Storey, John D.
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (48) : 18718 - 18723
[86] Lin Z., 2009, INT WRKSHP COMP ADV
[87] Liu H.Wang., 2012, TIGER TUNING INSENSI
[88] HIGH-DIMENSIONAL SEMIPARAMETRIC GAUSSIAN COPULA GRAPHICAL MODELS
Liu, Han
Han, Fang
Yuan, Ming
Lafferty, John
Wasserman, Larry
[J]. ANNALS OF STATISTICS, 2012, 40 (04) : 2293 - 2326
[89] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
Liu, Yufeng
Hayes, David Neil
Nobel, Andrew
Marron, J. S.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
[90] Luo Xi., 2011, HIGH DIMENSIONAL LOW

← 4 5 6 7 8 9 10 11 12 13 →