Artificial Skill due to Predictor Screening

被引:102
作者
DelSole, Timothy
Shukla, Jagadish
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Ctr Ocean Land Atmosphere Studies, Calverton, MD USA
基金
美国国家航空航天局; 美国海洋和大气管理局; 美国国家科学基金会;
关键词
TROPICAL CYCLONE ACTIVITY; MONTE-CARLO TECHNIQUES; REGRESSION-MODELS; SAMPLING ERRORS; STRATEGIES; EMPHASIS;
D O I
10.1175/2008JCLI2414.1
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
This paper shows that if predictors are selected preferentially because of their strong correlation with a prediction variable, then standard methods for validating prediction models derived from these predictors will be biased. This bias is demonstrated by screening random numbers and showing that regression models derived from these random numbers have apparent skill, in a cross-validation sense, even though the predictors cannot possibly have the slightest predictive usefulness. This result seemingly implies that random numbers can give useful predictions, since the sample being predicted is separate from the sample used to estimate the regression model. The resolution of this paradox is that, prior to cross validation, all of the data had been used to evaluate correlations for selecting predictors. This situation differs from real-time forecasts in that the future sample is not available for screening. These results clarify the fallacy in assuming that if a model performs well in cross-validation mode, then it will perform well in real-time forecasts. This bias appears to afflict several forecast schemes that have been proposed in the literature, including operational forecasts of Indian monsoon rainfall and number of Atlantic hurricanes. The cross-validated skill of these models probably would not be distinguishable from that of a no-skill model if prior screening were taken into account.
引用
收藏
页码:331 / 345
页数:15
相关论文
共 47 条
[1]  
[Anonymous], FMU180 IND MET DEP
[2]  
BARNSTON AG, 1994, B AM METEOROL SOC, V75, P2097, DOI 10.1175/1520-0477(1994)075<2097:LLSFDW>2.0.CO
[3]  
2
[4]  
Barnston AG, 1996, J CLIMATE, V9, P2660, DOI 10.1175/1520-0442(1996)009<2660:SAPOGS>2.0.CO
[5]  
2
[6]  
Bretherton CS, 1999, J CLIMATE, V12, P1990, DOI 10.1175/1520-0442(1999)012<1990:TENOSD>2.0.CO
[7]  
2
[8]   EFFECTS OF SAMPLING ERRORS IN STATISTICAL ESTIMATION [J].
CHELTON, DB .
DEEP-SEA RESEARCH PART A-OCEANOGRAPHIC RESEARCH PAPERS, 1983, 30 (10) :1083-1103
[9]  
Davis R. E., 1977, Geophysical and Astrophysical Fluid Dynamics, V8, P245, DOI 10.1080/03091927708240383
[10]  
DelSole Timothy, 2007, Journal of Climate, V20, P2810, DOI 10.1175/JCLI4179.1