Testing the predictive performance of distribution models

被引:181
作者
Bahn, Volker [1 ]
McGill, Brian J. [2 ,3 ]
机构
[1] Wright State Univ, Dept Biol Sci, Dayton, OH 45435 USA
[2] Univ Maine, Sch Biol & Ecol, Orono, ME 04469 USA
[3] Univ Maine, Sustainabil Solut Initiat, Orono, ME 04469 USA
基金
加拿大自然科学与工程研究理事会;
关键词
SPECIES DISTRIBUTION MODELS; SPATIAL AUTOCORRELATION; CLIMATE-CHANGE; ECOLOGICAL THEORY; RANDOM FORESTS; NICHE; POPULATION; GENERALITY; ABUNDANCE; TRANSFERABILITY;
D O I
10.1111/j.1600-0706.2012.00299.x
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold-out and two spatially segregated data hold-out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R2 for absolute abundance, squared correlation coefficient r2 for relative abundance and AUC/Somer's D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold-out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.
引用
收藏
页码:321 / 331
页数:11
相关论文
共 79 条
[1]  
[Anonymous], ECOGRAPHY
[2]  
[Anonymous], ESTIMATING NUMBERS T
[3]  
[Anonymous], BIODIVERSITY
[4]  
[Anonymous], ECOL MODELL
[5]  
[Anonymous], 1984, The Ecological Web: More on the Distribution and Abundance of Animals
[6]   Validation of species-climate impact models under climate change [J].
Araújo, MB ;
Pearson, RG ;
Thuiller, W ;
Erhard, M .
GLOBAL CHANGE BIOLOGY, 2005, 11 (09) :1504-1513
[7]   Downscaling European species atlas distributions to a finer resolution:: implications for conservation planning [J].
Araújo, MB ;
Thuiller, W ;
Williams, PH ;
Reginster, I .
GLOBAL ECOLOGY AND BIOGEOGRAPHY, 2005, 14 (01) :17-30
[8]   Ensemble forecasting of species distributions [J].
Araujo, Miguel B. ;
New, Mark .
TRENDS IN ECOLOGY & EVOLUTION, 2007, 22 (01) :42-47
[9]   How does climate change affect biodiversity? [J].
Araujo, Miguel B. ;
Rahbek, Carsten .
SCIENCE, 2006, 313 (5792) :1396-1397
[10]   Five (or so) challenges for species distribution modelling [J].
Araujo, Miguel B. ;
Guisan, Antoine .
JOURNAL OF BIOGEOGRAPHY, 2006, 33 (10) :1677-1688