Evaluating presence-absence models in ecology: the need to account for prevalence

被引:1368
作者
Manel, S
Williams, HC
Ormerod, SJ
机构
[1] Cardiff Univ, Sch Biosci, Catchment Res Grp, Cardiff CF1 3TL, S Glam, Wales
[2] Univ Grenoble 1, Lab Biol Populat Altitude, CNRS, UMR 5553, F-38041 Grenoble 09, France
关键词
Cohen's kappa; logistic regression; model performance; model testing; ROC; species; validation;
D O I
10.1046/j.1365-2664.2001.00647.x
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
1. Models for predicting the distribution of organisms from environmental data are widespread in ecology and conservation biology. Their performance is invariably evaluated from the percentage success at predicting occurrence at test locations. 2. Using logistic regression with real data from 34 families of aquatic invertebrates in 180 Himalayan streams, we illustrate how this widespread measure of predictive accuracy is affected systematically by the prevalence (i.e. the frequency of occurrence) of the target organism. Many evaluations of presence-absence models by ecologists are inherently misleading. 3. With the same invertebrate models, we examined alternative performance measures used in remote sensing and medical diagnostics. We particularly explored receiver-operating characteristic (ROC) plots, from which were derived (i) the area under each curve (AUC), considered an effective indicator of model performance independent of the threshold probability at which the presence of the target organism is accepted, and (ii) optimized probability thresholds that maximize the percentage of true absences and presences that are correctly identified. We also evaluated Cohen's kappa, a measure of the proportion of all possible cases of presence or absence that are predicted correctly after accounting for chance effects. 4. AUC measures from ROC plots were independent of prevalence, but highly significantly correlated with the much more easily computed kappa. Moreover, when applied in predictive mode to test data, models with thresholds optimized by ROC erroneously overestimated true occurrence among scarcer organisms, often those of greatest conservation interest. We advocate caution in using ROC methods to optimize thresholds required for real prediction. 5. Our strongest recommendation is that ecologists reduce their reliance on prediction success as a performance measure in presence-absence modelling. Cohen's kappa provides a simple, effective, standardized and appropriate statistic for evaluating or comparing presence-absence models, even those based on different statistical algorithms. None of the performance measures we examined tests the statistical significance of predictive accuracy, and we identify this as a priority area for research and development.
引用
收藏
页码:921 / 931
页数:11
相关论文
共 61 条
[1]  
Albert A., 1987, MULTIVARIATE INTERPR
[2]   Selecting areas for species persistence using occurrence data [J].
Araújo, MB ;
Williams, PH .
BIOLOGICAL CONSERVATION, 2000, 96 (03) :331-345
[3]   Current approaches to modelling the environmental niche of eucalypts: Implication for management of forest biodiversity [J].
Austin, MP ;
Meyers, JA .
FOREST ECOLOGY AND MANAGEMENT, 1996, 85 (1-3) :95-106
[4]   Habitat associations and breeding success of yellowhammers on lowland farmland [J].
Bradbury, RB ;
Kyrkos, A ;
Morris, AJ ;
Clark, SC ;
Perkins, AJ ;
Wilson, JD .
JOURNAL OF APPLIED ECOLOGY, 2000, 37 (05) :789-805
[5]   Predicting the likelihood of Eurasian watermilfoil presence in lakes, a macrophyte monitoring tool [J].
Buchan, LAJ ;
Padilla, DK .
ECOLOGICAL APPLICATIONS, 2000, 10 (05) :1442-1455
[6]   EMPIRICAL-MODELS FOR THE SPATIAL-DISTRIBUTION OF WILDLIFE [J].
BUCKLAND, ST ;
ELSTON, DA .
JOURNAL OF APPLIED ECOLOGY, 1993, 30 (03) :478-495
[7]   Large-scale processes in ecology and hydrology [J].
Caldow, RWG ;
Racey, PA .
JOURNAL OF APPLIED ECOLOGY, 2000, 37 :6-12
[8]  
Collett D, 1991, MODELLING BINARY DAT
[9]   Predicting the spatial distribution of non-indigenous riparian weeds: issues of spatial scale and extent [J].
Collingham, YC ;
Wadsworth, RA ;
Huntley, B ;
Hulme, PE .
JOURNAL OF APPLIED ECOLOGY, 2000, 37 :13-27
[10]   The impact of agricultural land use on stream chemistry in the Middle Hills of the Himalayas, Nepal [J].
Collins, R ;
Jenkins, A .
JOURNAL OF HYDROLOGY, 1996, 185 (1-4) :71-86