Constrain to perform:: Regularization of habitat models

被引:111
作者
Reineking, B
Schröder, B
机构
[1] Swiss Fed Inst Technol, Dept Environm Sci, CH-8092 Zurich, Switzerland
[2] UFZ Helmholtz Ctr Environm Res, Ctr Environm Res Leipzig Halle, Dept Ecol Modelling, D-04301 Leipzig, Germany
[3] Univ Potsdam, Inst Geoecol, D-14415 Potsdam, Germany
关键词
regularization; habitat models; logistic regression; Lasso; Ridge; penalized maximum likelihood; prediction; model selection; variable selection;
D O I
10.1016/j.ecolmodel.2005.10.003
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Predictive habitat models are an important tool for ecological research and conservation. A major cause of unreliable models is excessive model complexity, and regularization methods aim to improve the predictive performance by adequately constraining model complexity. We compare three regularization methods for logistic regression: variable selection, lasso, and ridge. They differ in the way model complexity is measured: variable selection uses the number of estimated parameters, the lasso uses the sum of the absolute values of the parameter estimates, and the ridge uses the sum of the squared values of the parameter estimates. We performed a simulation study with environmental data of a real landscape and artificial species occupancy data. We investigated the effect of three factors on relative model performance: (1) the number of parameters (16, 10, 6, 2) in the 'true' model that determined the distribution of the artificial species, (2) the prevalence, i.e. the proportion of sites occupied by the species, and (3) the sample size (measured in events per variable, EPV). Regularization improved model discrimination and calibration. However, no regularization method performed best under all circumstances: the ridge generally performed best in the 16-parameter scenario. The lasso generally performed best in the 10-parameter scenario. Variable selection with AIC was best at large sample sizes (EPV >= 10) when less than half of the variables influenced the species distribution. However, at low sample sizes (EPV < 10), ridge and lasso always performed best, regardless of the parameter scenario or prevalence. Overall, calibration was best in ridge models. Other methods showed overconfidence, particularly at low sample sizes. The percentage of correctly identified models was low for both lasso and variable selection. Variable selection should be used with caution. Although it can produce the best performing models under certain conditions, these situations are difficult to infer from the data. Ridge and lasso are risk-averse model strategies that can be expected to perform well under a wide range of underlying species-habitat relationships, particularly at small sample sizes. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:675 / 690
页数:16
相关论文
共 65 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] LINKING LANDSCAPE DATA WITH POPULATION VIABILITY ANALYSIS - MANAGEMENT OPTIONS FOR THE HELMETED HONEYEATER LICHENOSTOMUS MELANOPS CASSIDIX
    AKCAKAYA, HR
    MCCARTHY, MA
    PEARCE, JL
    [J]. BIOLOGICAL CONSERVATION, 1995, 73 (02) : 169 - 176
  • [3] Spatial prediction of species distribution: an interface between ecological theory and statistical modelling
    Austin, MP
    [J]. ECOLOGICAL MODELLING, 2002, 157 (2-3) : 101 - 118
  • [4] Avalos M, 2003, LECT NOTES COMPUT SC, V2810, P509, DOI 10.1007/978-3-540-45231-7_47
  • [5] Assessing effects of forecasted climate change on the diversity and distribution of European higher plants for 2050
    Bakkenes, M
    Alkemade, JRM
    Ihle, F
    Leemans, R
    Latour, JB
    [J]. GLOBAL CHANGE BIOLOGY, 2002, 8 (04) : 390 - 407
  • [6] Modelling potential impacts of climate change on the bioclimatic envelope of species in Britain and Ireland
    Berry, PM
    Dawson, TP
    Harrison, PA
    Pearson, RG
    [J]. GLOBAL ECOLOGY AND BIOGEOGRAPHY, 2002, 11 (06): : 453 - 462
  • [7] Beven K.J., 1979, Hydrol. Sci. Bull, V24, P43, DOI 10.1080/02626667909491834
  • [8] Using artificial neural networks to model the suitability of coastline for breeding by New Zealand fur seals (Arctocephalus forsteri)
    Bradshaw, CJA
    Davis, LS
    Purvis, M
    Zhou, QQ
    Benwell, GL
    [J]. ECOLOGICAL MODELLING, 2002, 148 (02) : 111 - 131
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Bayes model averaging with selection of regressors
    Brown, PJ
    Vannucci, M
    Fearn, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 : 519 - 536