Multiple regression for environmental data: nonlinearities and prediction bias

被引:28
作者
Geladi, P [1 ]
Hadjiiski, L
Hopke, P
机构
[1] Umea Univ, Dept Organ Chem, S-90187 Umea, Sweden
[2] Clarkson Univ, Dept Chem, Potsdam, NY 13699 USA
关键词
nonlinear multiple regression; partial least squares regression; prediction bias; local bias; environmental data;
D O I
10.1016/S0169-7439(98)00204-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiple regression models are often tested by making plots of predicted against measured values. In these plots, all observations are supposed to fall on the diagonal. Points not positioned on the diagonal show unmodeled behaviour. Some of these deviations are caused by random noise. Environmental data have quite some measurement and sampling noise and one is not supposed to model or predict this noise. However, there can also be a systematic variation, a bias. This bias is often expressed as systematically low predictions for high values. The high values fall below the diagonal in the plot. A kind of bias is a contraction around the diagonal. The high values are predicted too low and the low values are predicted too high: the predictions are contracted around the center of the data set. One factor contributing to bias or contraction is nonlinearities in the true physical relationship. The data set consists of hourly ozone measurements and parallel measurements of nitrogen oxides, temperature, UV radiation and more than 50 organic chemicals. The measurements were made on surface air in an urban environment. It may be assumed that the ozone concentrations are influenced by all the other variables, so a multivariate regression model may be made with 57 predictor variables and ozone concentration as the response variable. Because of expected collinearities and large noise, a partial least squares (PLS) regression model is chosen. The total set of 717 objects is split into a calibration and a test set of 358 and 359 objects, respectively. The data are noisy and the relationship is very nonlinear. It is shown how contraction and prediction bias occur and how extra steps of reducing and nonlinearizing the data remove these effects until only a substantial random noise is left. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:165 / 173
页数:9
相关论文
共 21 条
  • [1] [Anonymous], 1986, ENV POL SUS DEV
  • [2] [Anonymous], 1996, PREDICTION METHODS S
  • [3] [Anonymous], 1989, MULTIVARIATE CALIBRA
  • [4] BALL RJ, 1963, APPLIED STATISTICS, V12, P14, DOI DOI 10.2307/2985907
  • [5] Beebe K.R., 1998, CHEMOMETRICS PRACTIC
  • [6] Berglund A, 1997, J CHEMOMETR, V11, P141, DOI 10.1002/(SICI)1099-128X(199703)11:2<141::AID-CEM461>3.0.CO
  • [7] 2-2
  • [8] Boubel R.W., 1994, Fundamentals of Air pollution
  • [9] Box GEP, 1987, Empirical model-building and response surfaces
  • [10] Brown P. J., 1993, MEASUREMENT REGRESSI