Genetic Algorithm in theWavelet Domain for Large p Small n Regression

被引:2
作者
Howe, Eylem Deniz [1 ]
Nicolis, Orietta [2 ]
机构
[1] Mimar Sinan Fine Arts Univ, Dept Stat, Istanbul, Turkey
[2] Univ Valparaiso, Inst Estadist, Valparaiso, Chile
关键词
Functional regression; Genetic algorithm; Wavelet domain; 46N30; 65T60; 65Y10; 32A70; VARIABLE SELECTION; WAVELET REGRESSION; LINEAR-REGRESSION; OPTIMIZATION; SHRINKAGE; CURVES;
D O I
10.1080/03610918.2013.809101
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many areas of statistical modeling are plagued by the curse of dimensionality, in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.
引用
收藏
页码:1144 / 1157
页数:14
相关论文
共 38 条
  • [1] [Anonymous], 2024, P INT SCI CONFERENCE
  • [2] [Anonymous], 1987, PSYCHOMETRICA, DOI DOI 10.1007/BF02294361
  • [3] AN EDGE-DETECTION TECHNIQUE USING GENETIC ALGORITHM-BASED OPTIMIZATION
    BHANDARKAR, SM
    ZHANG, YQ
    POTTER, WD
    [J]. PATTERN RECOGNITION, 1994, 27 (09) : 1159 - 1180
  • [4] Boyce D.E., 1974, Optimal Subset Selection
  • [5] Multivariate Bayesian variable selection and prediction
    Brown, PJ
    Vannucci, M
    Fearn, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 627 - 641
  • [6] Brown PJ, 1999, BIOMETRIKA, V86, P635
  • [7] Bayesian wavelet regression on curves with application to a spectroscopic calibration problem
    Brown, PJ
    Fearn, T
    Vannucci, M
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (454) : 398 - 408
  • [8] Brown PJ, 1998, J CHEMOMETR, V12, P173, DOI 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.3.CO
  • [9] 2-S
  • [10] Burns P., 1992, GENETIC ALGORITHM RO