Effects of sample survey design on the accuracy of classification tree models in species distribution models

被引:114
作者
Edwards, Thomas C., Jr.
Cutler, D. Richard
Zimmermann, Niklaus E.
Geiser, Linda
Moisen, Gretchen G.
机构
[1] Utah State Univ, USGS Utah Cooperat Res Unit, Coll Nat Resources, Logan, UT 84322 USA
[2] Utah State Univ, Dept Math & Stat, Logan, UT 84322 USA
[3] Swiss Fed Res Inst WSL, Dept Landscape Res, CH-8903 Birmensdorf, Switzerland
[4] USDA Forest Serv, Siuslaw Natl Forest, Corvallis, OR 97339 USA
[5] USDA Forest Serv, Rocky Mt Res Stn, Ogden, UT 84401 USA
关键词
model accuracy; sample survey; study design; classification trees; lichens; accuracy assessment; probability samples; non-probability samples;
D O I
10.1016/j.ecolmodel.2006.05.016
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:132 / 141
页数:10
相关论文
共 49 条
[1]  
[Anonymous], 2000, APPL LOGISTICS REGRE
[2]   ALTITUDINAL DISTRIBUTION OF SEVERAL EUCALYPT SPECIES IN RELATION TO OTHER ENVIRONMENTAL-FACTORS IN SOUTHERN NEW-SOUTH-WALES [J].
AUSTIN, MP ;
CUNNINGHAM, RB ;
GOOD, RB .
AUSTRALIAN JOURNAL OF ECOLOGY, 1983, 8 (02) :169-180
[3]   MEASUREMENT OF THE REALIZED QUALITATIVE NICHE - ENVIRONMENTAL NICHES OF 5 EUCALYPTUS SPECIES [J].
AUSTIN, MP ;
NICHOLLS, AO ;
MARGULES, CR .
ECOLOGICAL MONOGRAPHS, 1990, 60 (02) :161-177
[4]   VEGETATION SURVEY DESIGN FOR CONSERVATION - GRADSECT SAMPLING OF FORESTS IN NORTHEASTERN NEW-SOUTH-WALES [J].
AUSTIN, MP ;
HEYLIGERS, PC .
BIOLOGICAL CONSERVATION, 1989, 50 (1-4) :13-32
[5]  
Best L.B., 1986, P209
[6]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   FORAGING HABITAT AND HOME-RANGE CHARACTERISTICS OF CALIFORNIA SPOTTED OWLS IN THE SIERRA-NEVADA [J].
CALL, DR ;
GUTIERREZ, RJ ;
VERNER, J .
CONDOR, 1992, 94 (04) :880-888
[9]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[10]  
Cochran, 1977, SAMPLING TECHNIQUES