Bayesian wavelet regression on curves with application to a spectroscopic calibration problem

被引:121
作者
Brown, PJ [1 ]
Fearn, T
Vannucci, M
机构
[1] Univ Kent, Inst Math & Stat, Canterbury CT2 7NF, Kent, England
[2] UCL, Dept Stat Sci, London WC1E 6BT, England
[3] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
关键词
Markov chain Monte Carlo; mixture prior; model averaging; multivariate regression; near-infrared spectroscopy; variable selection;
D O I
10.1198/016214501753168118
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Motivated by calibration problems in near-infrared (NIR) spectroscopy we consider the linear regression setting in which the many predictor variables arise from sampling an essentially continuous curve at equally spaced points and there may be multiple predictands. We tackle this regression problem by calculating the wavelet transforms of the discretized curves. then applying a Bayesian variable selection method using mixture priors to the multivariate regression of predictands on wavelet coefficients. Far prediction purposes, we average over a set of likely models. Applied to a particular problem in NIR spectroscopy, this approach was able to find subsets of the wavelet coefficients with overall better predictive performance than the more usual approaches. In the application, the available predictors are measurements of the NIR reflectance spectrum of biscuit dough pieces at 256 equally spaced wavelengths. The aim is to predict the composition (i.e., the fat, flour, sugar, and water content) of the dough pieces using the spectral variables. Thus we have a multivariate regression of four predictands on 256 predictors with quite high intercorrelation among the predictors. A training set of 39 samples is available to fit this regression. Applying a wavelet transform replaces the 256 measurements on each spectrum with 256 wavelet coefficients that carry the same information. The variable selection method could use subsets of these coefficients that gave good predictions for all four compositional variables on a separate test set of samples. Selecting in the wavelet domain rather than from the original spectral variables is appealing in this application, because a single wavelet coefficient can carry information from a band of wavelengths in the original spectrum. This band can be narrow or wide, depending on the scale of the wavelet selected.
引用
收藏
页码:398 / 408
页数:11
相关论文
共 38 条
[1]  
Anderson T.W., 1986, STAT ANAL DATA, V2nd
[2]  
[Anonymous], 1992, 10 LECT WAVELETS
[3]  
Brown P. J., 1993, MEASUREMENT REGRESSI
[4]   Multivariate Bayesian variable selection and prediction [J].
Brown, PJ ;
Vannucci, M ;
Fearn, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 :627-641
[5]  
Brown PJ, 1999, BIOMETRIKA, V86, P635
[6]  
Brown PJ, 1998, J CHEMOMETR, V12, P173, DOI 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.0.CO
[7]  
2-0
[8]  
CARLIN BP, 1995, J ROY STAT SOC B MET, V57, P473
[9]   Bayesian variable selection with related predictors [J].
Chipman, H .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1996, 24 (01) :17-36
[10]   Flexible empirical Bayes estimation for wavelets [J].
Clyde, M ;
George, EI .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :681-698