Adapting to unknown sparsity by controlling the false discovery rate

被引:224
作者
Abramovich, Felix [1 ]
Benjamini, Yoav
Donoho, David L.
Johnstone, Iain M.
机构
[1] Tel Aviv Univ, Sackler Fac Exact Sci, Dept Stat & Operat Res, Sch Math Sci, IL-69978 Tel Aviv, Israel
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
thresholding; wavelet denoising; minimax estimation; multiple comparisons; model selection; smoothing parameter selection;
D O I
10.1214/009053606000000074
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We attempt to recover an n-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the l(p) norm for p small. We obtain a procedure which is asymptotically minimax for l(r) loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a data-adaptive thresholding scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain expected fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q(n) also plays a determining role in asymptotic minimaxity. If q = limq(n) is an element of [0, 1/2] and also q(n) > gamma/log(n), we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, q = lim q(n) is an element of (1/2, 1] forces the risk to exceed the minimax risk by a factor growing with q. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2 center dot log(potential model size/actual model sizes). We exhibit a close connection with FDR-controlling procedures under stringent control of the false discovery rate.
引用
收藏
页码:584 / 653
页数:70
相关论文
共 43 条