TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

被引:83
作者
Allen, Genevera I. [1 ]
Tibshirani, Robert [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
Matrix-variate normal; covariance estimation; imputation; EM algorithm; transposable data; EM ALGORITHM; LIKELIHOOD-ESTIMATION; PENALIZED LIKELIHOOD; MICROARRAYS; REGRESSION; SELECTION;
D O I
10.1214/09-AOAS314
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so-called transposable regularized covariance models allow for maximum likelihood estimation of the mean and nonsingular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
引用
收藏
页码:764 / 790
页数:27
相关论文
共 26 条
[1]  
ALLEN GI, 2010, TRANSPOSABLE REGUL S, DOI DOI 10.1214/09-AOAS314SUPP
[2]  
[Anonymous], J AM STAT ASSOC
[3]  
BELL RM, 2007, P 13 ACM SIGKDD INT, P95, DOI DOI 10.1145/1281192.1281206
[4]  
Bennett J, 2007, P KDD CUP WORKSH SAN
[5]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575
[6]  
Bonilla E. V., 2007, ADV NEURAL INFORM PR, V20
[7]   Exact Matrix Completion via Convex Optimization [J].
Candes, Emmanuel J. ;
Recht, Benjamin .
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2009, 9 (06) :717-772
[8]   Stochastic versions of the EM algorithm: An experimental study in the mixture case [J].
Celeux, G ;
Chauveau, D ;
Diebolt, J .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) :287-314
[9]   The MLE algorithm for the matrix normal distribution [J].
Dutilleul, P .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 64 (02) :105-123
[10]   ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER? [J].
Efron, Bradley .
ANNALS OF APPLIED STATISTICS, 2009, 3 (03) :922-942