A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics

被引:1051
作者
Schäfer, J [1 ]
Strimmer, K [1 ]
机构
[1] Univ Munich, Dept Stat, D-80539 Munich, Germany
来源
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY | 2005年 / 4卷
关键词
shrinkage; covariance estimation; small n; large p" problem; graphical Gaussian model (GGM); genetic network; gene expression;
D O I
10.2202/1544-6115.1175
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Inferring large-scale covariance matrices from sparse genomic data is an ubiquitous problem in bioinformatics. Clearly, the widely used standard covariance and correlation estimators are ill-suited for this purpose. As statistically efficient and computationally fast alternative we propose a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity. Subsequently, we apply this improved covariance estimator (which has guaranteed minimum mean squared error, is well-conditioned, and is always positive definite even for small sample sizes) to the problem of inferring large-scale gene association networks. We show that it performs very favorably compared to competing approaches both in simulations as well as in application to real expression data.
引用
收藏
页码:1 / 30
页数:32
相关论文
共 46 条
[1]   Network biology:: Understanding the cell's functional organization [J].
Barabási, AL ;
Oltvai, ZN .
NATURE REVIEWS GENETICS, 2004, 5 (02) :101-U15
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Statistical analysis of financial networks [J].
Boginski, V ;
Butenko, S ;
Pardalos, PM .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (02) :431-443
[4]   Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks [J].
Butte, AJ ;
Tamayo, P ;
Slonim, D ;
Golub, TR ;
Kohane, IS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12182-12186
[5]   A note on pseudolikelihood constructed from marginal densities [J].
Cox, DR ;
Reid, N .
BIOMETRIKA, 2004, 91 (03) :729-737
[6]   Improved statistical tests for differential gene expression by shrinking variance components estimates [J].
Cui, XG ;
Hwang, JTG ;
Qiu, J ;
Blades, NJ ;
Churchill, GA .
BIOSTATISTICS, 2005, 6 (01) :59-75
[7]   Shrinkage estimators for covariance matrices [J].
Daniels, MJ ;
Kass, RE .
BIOMETRICS, 2001, 57 (04) :1173-1184
[8]   Discovery of meaningful associations in genomic data using partial correlation coefficients [J].
de la Fuente, A ;
Bing, N ;
Hoeschele, I ;
Mendes, P .
BIOINFORMATICS, 2004, 20 (18) :3565-3574
[9]   Sparse graphical models for exploring gene expression data [J].
Dobra, A ;
Hans, C ;
Jones, B ;
Nevins, JR ;
Yao, GA ;
West, M .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) :196-212
[10]   STEINS PARADOX IN STATISTICS [J].
EFRON, B ;
MORRIS, C .
SCIENTIFIC AMERICAN, 1977, 236 (05) :119-127