MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE

被引:645
作者
CHATFIELD, C
机构
来源
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY | 1995年 / 158卷
关键词
AUTOREGRESSIVE MODEL; BAYESIAN MODEL AVERAGING; DATA MINING; FORECASTING; MODEL BUILDING; RESAMPLING; STATISTICAL INFERENCE; SUBSET SELECTION;
D O I
10.2307/2983440
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
This paper takes abroad, pragmatic view of statistical inference to include all aspects of model formulation. The estimation of model: parameters traditionally assumes that a model has a prespecified known form and takes no account of possible uncertainty regarding the model structure. This implicitly assumes the existence of a 'true' model, which many would regard-as a fiction. In practice model uncertainty is a fact of life and likely to be more serious than other sources of uncertainty which have received far more attention from statisticians. This is true whether the model is specified on subject-matter grounds or, as is increasingly the case, when a model is formulated, fitted and checked on the same data set in an iterative, interactive way. Modern computing power allows a large number of models to be considered and data-dependent specification searches have become the norm in many areas of statistics. The term data mining may be used in this context when the analyst goes to great lengths to obtain a good fit. This paper reviews the effects of model uncertainty, such as too narrow prediction intervals, and the non-trivial biases in parameter estimates which can follow data-based modelling. Ways of assessing and overcoming the effects of model uncertainty are discussed, including the use of simulation and resampling methods, a Bayesian model averaging approach and collecting additional data wherever possible. Perhaps the main aim of the paper is to ensure that statisticians are aware of the problems and start addressing the issues even if there is no simple, general theoretical fix.
引用
收藏
页码:419 / 466
页数:48
相关论文
共 92 条
[1]  
ADAMS JL, 1991, P AM STAT ASS SECTIO, P55
[2]  
Agiakloglou C., 1992, J TIME SER ANAL, V14, P471, DOI DOI 10.1111/J.1467-9892.1992.TB00121.X
[4]  
AKAIKE H, 1979, BIOMETRIKA, V66, P237, DOI 10.1093/biomet/66.2.237
[5]   DANGERS OF USING OPTIMAL CUTPOINTS IN THE EVALUATION OF PROGNOSTIC FACTORS [J].
ALTMAN, DG ;
LAUSEN, B ;
SAUERBREI, W ;
SCHUMACHER, M .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1994, 86 (11) :829-835
[6]  
AMEEN JRM, 1984, TIME SERIES ANAL THE, P117
[7]  
[Anonymous], 1989, BAYESIAN FORECASTING
[8]  
ANSCOMBE FJ, 1967, J ROY STAT SOC B, V29, P1
[9]   FINITE-SAMPLE PROPERTIES OF ESTIMATORS FOR AUTOREGRESSIVE MOVING AVERAGE MODELS [J].
ANSLEY, CF ;
NEWBOLD, P .
JOURNAL OF ECONOMETRICS, 1980, 13 (02) :159-183
[10]   TEST FOR CLUSTERS [J].
ARNOLD, SJ .
JOURNAL OF MARKETING RESEARCH, 1979, 16 (04) :545-551