ROBUST RANK CORRELATION BASED SCREENING

被引:269
作者
Li, Gaorong [1 ]
Peng, Heng [2 ]
Zhang, Jun [3 ]
Zhu, Lixing [2 ]
机构
[1] Beijing Univ Technol, Coll Appl Sci, Beijing 100124, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Hong Kong, Hong Kong, Peoples R China
[3] Shenzhen Univ, Shen Zhen Hong Kong Joint Res Ctr Appl Stat, Shenzhen 518060, Peoples R China
基金
高等学校博士学科点专项科研基金;
关键词
Variable selection; rank correlation screening; dimensionality reduction; semiparametric models; large p small n; SIS; NONCONCAVE PENALIZED LIKELIHOOD; GENERALIZED LINEAR-MODELS; SLICED INVERSE REGRESSION; VARIABLE SELECTION; DIVERGING NUMBER;
D O I
10.1214/12-AOS1024
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Independence screening is a variable selection method that uses a ranking criterion to select significant variables, particularly for statistical models with nonpolynomial dimensionality or "large p, small n" paradigms when p can be as large as an exponential of the sample size n. In this paper we propose a robust rank correlation screening (RRCS) method to deal with ultra-high dimensional data. The new procedure is based on the Kendall tau correlation coefficient between response and predictor variables rather than the Pearson correlation of existing methods. The new method has four desirable features compared with existing independence screening methods. First, the sure independence screening property can hold only under the existence of a second order moment of predictor variables, rather than exponential tails or alikeness, even when the number of predictor variables grows as fast as exponentially of the sample size. Second, it can be used to deal with semiparametric models such as transformation regression models and single-index models under monotonic constraint to the link function without involving nonparametric estimation even when there are nonparametric functions in the models. Third, the procedure can be largely used against outliers and influence points in the observations. Last, the use of indicator functions in rank correlation screening greatly simplifies the theoretical derivation due to the boundedness of the resulting statistics, compared with previous studies on variable screening. Simulations are carried out for comparisons with existing methods and a real data example is analyzed.
引用
收藏
页码:1846 / 1877
页数:32
相关论文
共 46 条
[1]  
Albright S.C., 1999, DATA ANAL DECISION M
[2]  
[Anonymous], 1988, Transformation and weighting in regressionNew
[3]  
[Anonymous], 2000, AMS C MATH CHALLENGE
[4]  
[Anonymous], 2006, An introduction to copulas
[5]  
[Anonymous], 1997, TECHNICAL REPORT
[6]  
[Anonymous], 2009, Wiley Series in Probability and Statistics, DOI DOI 10.1002/9780470434697.CH7
[7]   AN ANALYSIS OF TRANSFORMATIONS REVISITED [J].
BICKEL, PJ ;
DOKSUM, KA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (374) :296-311
[8]   AN ANALYSIS OF TRANSFORMATIONS [J].
BOX, GEP ;
COX, DR .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) :211-252
[9]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[10]  
Channouf N, 2009, WINT SIMUL C PROC, P377