HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY

被引:237
作者
Loh, Po-Ling [1 ]
Wainwright, Martin J. [1 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
High-dimensional statistics; missing data; nonconvexity; regularization; sparse linear regression; M-estimation; SELECTION; LASSO; RECOVERY;
D O I
10.1214/12-AOS1018
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings.
引用
收藏
页码:1637 / 1664
页数:28
相关论文
共 23 条
[1]  
AGARWAL A., 2012, FAST GLOBAL CONVERGE
[2]  
[Anonymous], ADV NEURAL INFORM PR
[3]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[4]  
Carroll R., 1995, Monographs on Statistics and Applied Probability, V63
[5]   Atomic decomposition by basis pursuit [J].
Chen, SSB ;
Donoho, DL ;
Saunders, MA .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) :33-61
[6]  
Duchi J., 2008, P 25 INT C MACH LEAR, P272, DOI DOI 10.1145/1390156.1390191
[7]  
HWANG JT, 1986, J AM STAT ASSOC, V81, P680
[8]   Polynomial regression and estimating functions in the presence of multiplicative measurement error [J].
Iturria, SJ ;
Carroll, RJ ;
Firth, D .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1999, 61 :547-561
[9]  
Little R. J. A., 1987, STAT ANAL MISSING DA
[10]   High-dimensional graphs and variable selection with the Lasso [J].
Meinshausen, Nicolai ;
Buehlmann, Peter .
ANNALS OF STATISTICS, 2006, 34 (03) :1436-1462