Large covariance estimation by thresholding principal orthogonal complements

被引:498
作者
Fan, Jianqing [1 ]
Liao, Yuan [2 ]
Mincheva, Martina [3 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Univ Maryland, College Pk, MD 20742 USA
[3] Princeton Univ, Princeton, NJ 08544 USA
基金
英国工程与自然科学研究理事会; 美国国家卫生研究院;
关键词
Approximate factor model; Cross-sectional correlation; Diverging eigenvalues; High dimensionality; Low rank matrix; Principal components; Sparse matrix; Thresholding; Unknown factors; DYNAMIC-FACTOR MODEL; HIGH-DIMENSION; MATRIX DECOMPOSITION; PORTFOLIO SELECTION; COMPONENTS-ANALYSIS; LARGEST EIGENVALUE; FALSE DISCOVERY; OPTIMAL RATES; LARGE NUMBER; CONSISTENCY;
D O I
10.1111/rssb.12016
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method POET' to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
引用
收藏
页码:603 / 680
页数:78
相关论文
共 127 条
  • [81] Principal components analysis of nonstationary time series data
    Lansangan, Joseph Ryan G.
    Barrios, Erniel B.
    [J]. STATISTICS AND COMPUTING, 2009, 19 (02) : 173 - 187
  • [82] Lawley D. N., 1971, Factor analysis as a statistical method
  • [83] A well-conditioned estimator for large-dimensional covariance matrices
    Ledoit, O
    Wolf, M
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (02) : 365 - 411
  • [84] Ledoit O., 2003, Journal of Empirical Finance, V10, P603, DOI DOI 10.1016/S0927-5398(03)00007-0
  • [85] A general framework for multiple testing dependence
    Leek, Jeffrey T.
    Storey, John D.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (48) : 18718 - 18723
  • [86] Lin Z., 2009, INT WRKSHP COMP ADV
  • [87] Liu H.Wang., 2012, TIGER TUNING INSENSI
  • [88] HIGH-DIMENSIONAL SEMIPARAMETRIC GAUSSIAN COPULA GRAPHICAL MODELS
    Liu, Han
    Han, Fang
    Yuan, Ming
    Lafferty, John
    Wasserman, Larry
    [J]. ANNALS OF STATISTICS, 2012, 40 (04) : 2293 - 2326
  • [89] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
  • [90] Luo Xi., 2011, HIGH DIMENSIONAL LOW