Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models

被引:438
作者
Fan, Jianqing [1 ]
Feng, Yang [2 ]
Song, Rui [3 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] Columbia Univ, Dept Stat, New York, NY 10027 USA
[3] Colorado State Univ, Dept Stat, Ft Collins, CO 80523 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Additive model; Independent learning; Nonparametric independence screening; Nonparametric regression; Sparsity; Sure independence screening; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; REGRESSION;
D O I
10.1198/jasa.2011.tm09779
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A variable screening procedure via correlation learning was proposed by Fan and Lv (2008) to reduce dimensionality in sparse ultra-high-dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening (NIS) is a specific type of sure independence screening. We propose several closely related variable screening procedures. We show that with general nonparametric models, under some mild technical conditions, the proposed independence screening methods have a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, we also propose a data-driven thresholding and an iterative nonparametric independence screening (INIS) method to enhance the finite- sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.
引用
收藏
页码:544 / 557
页数:14
相关论文
共 39 条
[11]  
Fan JQ, 2009, J MACH LEARN RES, V10, P2013
[12]   Nonparametric inferences for additive models [J].
Fan, JQ ;
Jiang, JC .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (471) :890-907
[13]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[14]   Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems [J].
Hall, Peter ;
Miller, Hugh .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (03) :533-550
[15]   Tilting methods for assessing the influence of components in a classifier [J].
Hall, Peter ;
Titterington, D. M. ;
Xue, Jing-Hao .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2009, 71 :783-803
[16]   Optimal estimation in additive regression models [J].
Horowitz, J ;
Klemelä, J ;
Mammen, E .
BERNOULLI, 2006, 12 (02) :271-298
[17]   Asymptotic properties of bridge estimators in sparse high-dimensional regression models [J].
Huang, Jian ;
Horowitz, Joel L. ;
Ma, Shuangge .
ANNALS OF STATISTICS, 2008, 36 (02) :587-613
[18]   VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS [J].
Huang, Jian ;
Horowitz, Joel L. ;
Wei, Fengrong .
ANNALS OF STATISTICS, 2010, 38 (04) :2282-2313
[19]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[20]  
Kim Y, 2006, STAT SINICA, V16, P375