Strictly proper scoring rules, prediction, and estimation

被引:3064
作者
Gneiting, Tilmann [1 ]
Raftery, Adrian E. [1 ]
机构
[1] Univ Washington, Dept Stat, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Bayes factor; Bregman divergence; brier score; coherent; continuous ranked probability score; cross-validation; entropy; kernel score; loss function; minimum contrast estimation; negative definite function; prediction interval; predictive distribution; quantile forecast; scoring rule; skill score; strictly proper; utility function;
D O I
10.1198/016214506000001437
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper. if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G 4 F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to cross-validation, and propose a novel form of cross-validation known as random-fold cross-validation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile estimation, and propose the intuitively appealing interval score as a utility function in interval estimation that addresses width as well as coverage.
引用
收藏
页码:359 / 378
页数:20
相关论文
共 146 条
[1]   On a new multivariate two-sample test [J].
Baringhaus, L ;
Franz, C .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) :190-206
[2]  
Bauer H., 2001, MEASURE INTEGRATION
[3]  
Berg C., 1984, HARMONIC ANAL SEMIGR
[4]  
Bernardo J. M., 1994, BAYESIAN THEORY
[5]   EXPECTED INFORMATION AS EXPECTED UTILITY [J].
BERNARDO, JM .
ANNALS OF STATISTICS, 1979, 7 (03) :686-690
[6]   BAYESIAN COMPUTATION AND STOCHASTIC-SYSTEMS [J].
BESAG, J ;
GREEN, P ;
HIGDON, D ;
MENGERSEN, K .
STATISTICAL SCIENCE, 1995, 10 (01) :3-41
[7]   RATES OF CONVERGENCE FOR MINIMUM CONTRAST ESTIMATORS [J].
BIRGE, L ;
MASSART, P .
PROBABILITY THEORY AND RELATED FIELDS, 1993, 97 (1-2) :113-150
[8]  
Bregman LM, 1967, USSR Computational Mathematics and Mathematical Physics, V7, P200
[9]  
Bremnes JB, 2004, MON WEATHER REV, V132, P338, DOI 10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO
[10]  
2