Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain)

被引:327
作者
Rodriguez-Galiano, Victor [1 ]
Mendes, Maria Paula [2 ]
Jose Garcia-Soldado, Maria [3 ]
Chica-Olmo, Mario [3 ]
Ribeiro, Luis [2 ]
机构
[1] Univ Southampton, Sch Geog, Southampton SO17 1BJ, Hants, England
[2] Univ Lisbon, Inst Super Tecn, CVRM, P-1049001 Lisbon, Portugal
[3] Univ Granada, Dept Geodinam, E-18071 Granada, Spain
关键词
Random Forest; Groundwater; Vulnerability assessment; Machine learning techniques; Nitrates; PRINCIPAL COMPONENT ANALYSIS; ARTIFICIAL NEURAL-NETWORKS; MODIFIED DRASTIC MODEL; PESTICIDE CONTAMINATION; GENETIC ALGORITHM; DECISION TREES; LANDSAT TM; CLASSIFICATION; SELECTION; VECTOR;
D O I
10.1016/j.scitotenv.2014.01.001
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Watershed management decisions need robust methods, which allow an accurate predictive modeling of pollutant occurrences. Random Forest (RF) is a powerful machine learning data driven method that is rarely used in water resources studies, and thus has not been evaluated thoroughly in this field, when compared to more conventional pattern recognition techniques key advantages of RF include: its non-parametric nature; high predictive accuracy; and capability to determine variable importance. This last characteristic can be used to better understand the individual role and the combined effect of explanatory variables in both protecting and exposing groundwater from and to a pollutant. In this paper, the performance of the RF regression for predictive modeling of nitrate pollution is explored, based on intrinsic and specific vulnerability assessment of the Vega de Granada aquifer. The applicability of this new machine learning technique is demonstrated in an agriculture-dominated area where nitrate concentrations in groundwater can exceed the trigger value of 50 mg/L, at many locations. A comprehensive GIS database of twenty-four parameters related to intrinsic hydrogeologic proprieties, driving forces, remotely sensed variables and physical-chemical variables measured in "situ", were used as inputs to build different predictive models of nitrate pollution. RF measures of importance were also used to define the most significant predictors of nitrate pollution in groundwater, allowing the establishment of the pollution sources (pressures). The potential of RF for generating a vulnerability map to nitrate pollution is assessed considering multiple criteria related to variations in the algorithm parameters and the accuracy of the maps. The performance of the RF is also evaluated in comparison to the logistic regression (LR) method using different efficiency measures to ensure their generalization ability. Prediction results show the ability of RF to build accurate models with strong predictive capabilities. (c) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:189 / 206
页数:18
相关论文
共 98 条
[61]   Feature Selection for Classification of Hyperspectral Data by SVM [J].
Pal, Mahesh ;
Foody, Giles M. .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2010, 48 (05) :2297-2307
[62]  
Palmquist R, 1993, GROUND WAT VULN ASS, V15, P91
[63]   Estimation of gradients from sparse data by universal kriging -: art. no. W12418 [J].
Pardo-Igúquiza, E ;
Chica-Olmo, M .
WATER RESOURCES RESEARCH, 2004, 40 (12) :1-17
[64]   An integrated GIS based fuzzy pattern recognition model to compute groundwater vulnerability index for decision making [J].
Pathak, Dhundi Raj ;
Hiratsuka, Akira .
JOURNAL OF HYDRO-ENVIRONMENT RESEARCH, 2011, 5 (01) :63-77
[65]   Random forests as a tool for ecohydrological distribution modelling [J].
Peters, Jan ;
De Baets, Bernard ;
Verhoest, Niko E. C. ;
Samson, Roeland ;
Degroeve, Sven ;
De Becker, Piet ;
Huybrechts, Willy .
ECOLOGICAL MODELLING, 2007, 207 (2-4) :304-318
[66]  
PPPH-DHG, 2010, REC MAS AG SUBT SUB
[67]  
Quinlan J. R., 1993, C4.5: Programs for Machine Learning
[68]  
Ribeiro L., 2005, 7 S HYDR WAT RES POR
[69]   Land cover change analysis of a Mediterranean area in Spain using different sources of data: Multi-seasonal Landsat images, land surface temperature, digital terrain models and texture [J].
Rodriguez-Galiano, V. ;
Chica-Olmo, M. .
APPLIED GEOGRAPHY, 2012, 35 (1-2) :208-218
[70]   Random Forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture [J].
Rodriguez-Galiano, V. F. ;
Chica-Olmo, M. ;
Abarca-Hernandez, F. ;
Atkinson, P. M. ;
Jeganathan, C. .
REMOTE SENSING OF ENVIRONMENT, 2012, 121 :93-107