Variable selection in wavelet regression models

被引:84
作者
Alsberg, BK [1 ]
Woodward, AM
Winson, MK
Rowland, JJ
Kell, DB
机构
[1] Univ Wales, Inst Biol Sci, Aberystwyth SY23 3DD, Ceredigion, Wales
[2] Univ Wales, Dept Comp Sci, Aberystwyth SY23 3DD, Ceredigion, Wales
基金
英国生物技术与生命科学研究理事会;
关键词
wavelet regression; multivariate calibration; partial least squares; infrared spectra; feature selection; variable selection; mutual information; scalogram; feature extraction;
D O I
10.1016/S0003-2670(98)00194-9
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Variable selection and compression are often used to produce more parsimonious regression models. But when they are applied directly to the original spectrum domain, it is not easy to determine the type of feature the selected variables represent. By performing variable selection in the wavelet domain we show that it is possible to identify important variables as being part of short- or large-scale features. Therefore, the suggested method is to extract information about the selected variables that otherwise would have been inaccessible. We are also able to obtain information about the location of these features in the original domain. In this article we demonstrate three types of variable selection methods applied to the wavelet domain: selection of optimal combination of scales, thresholding based on mutual information and truncation of weight vectors in the partial least squares (PLS) regression algorithm. We found that truncation of weight vectors in PLS was the most effective method for selecting variables. For the two experimental data sets tested we obtained approximately the same prediction error using less than 1% (for Data set 1) and 10% (for Data set 2) of the original variables. We also discovered that the selected variables were restricted to a limited number of wavelet scales. This information can be used to suggest whether the underlying features may be dominated by narrow (selective) peaks (indicated by variables in short wavelet scale regions) or by broader regions (indicated by variables in long wavelet scale regions). Thus, wavelet regression is here used as an extension of the more traditional Fourier regression (where the modelling is performed in the frequency domain without taking into consideration any of the information in the time domain). (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:29 / 44
页数:16
相关论文
共 66 条