Generalisation for neural networks through data sampling and training procedures, with applications to streamflow predictions

被引:66
作者
Anctil, F
Lauzon, N
机构
[1] Univ Laval, Dept Civil Engn, Quebec City, PQ G1K 7P4, Canada
[2] Golder Assoc Ltd, Calgary, AB T2P 3T1, Canada
关键词
neural networks; generalisation; stacking; bagging; boosting; stop-training; Bayesian regularisation; streamflow modelling;
D O I
10.5194/hess-8-940-2004
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Since the 1990s, neural networks have been applied to many studies in hydrology and water resources. Extensive reviews on neural network modelling have identified the major issues affecting modelling performance: one of the most important is generalisation, which refers to building models that can infer the behaviour of the system under study for conditions represented not only in the data employed for training and testing but also for those conditions not present in the data sets but inherent to the system. This work compares five generalisation approaches: stop training, Bayesian regularisation, stacking, bagging and boosting. All have been tested with neural networks in various scientific domains stop training and stacking having been applied regularly in hydrology and water resources for some years, while Bayesian regularisation, bagging and boosting have been less common. The comparison is applied to streamflow modelling with multi-layer perceptron neural networks and the Levenberg-Marquardt algorithm as training procedure. Six catchments, with diverse hydrological behaviours, are employed as test cases to draw general conclusions and guidelines on the use of the generalisation techniques for practitioners in hydrology and water resources. All generalisation approaches provide improved performance compared with standard neural networks without generalisation. Stacking. bagging and boosting, which affect the construction of training sets, provide the best improvement from standard models, compared with stop-training and Bayesian regularisation, which regulate the training algorithm. Stacking performs better than the others although the benefit in performance is slight compared with bagging and boosting; furthermore, it is not consistent from one catchment to another. For a good combination of improvement and stability in modelling performance, the joint use of stop training or Bayesian regularisation with either bagging or boosting is recommended.
引用
收藏
页码:940 / 958
页数:19
相关论文
共 75 条
[1]   Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchments [J].
Abrahart, RJ ;
See, L .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2002, 6 (04) :655-670
[2]  
Abrahart RJ, 2000, HYDROL PROCESS, V14, P2157, DOI [10.1002/1099-1085(20000815/30)14:11/12<2157::AID-HYP57>3.0.CO
[3]  
2-S, 10.1002/1099-1085(20000815/30)14:11/12&lt
[4]  
2157::AID-HYP57&gt
[5]  
3.0.CO
[6]  
2-S]
[7]   On the use of neural network ensembles in QSAR and QSPR [J].
Agrafiotis, DK ;
Cedeño, W ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (04) :903-911
[8]   Asymptotic statistical theory of overtraining and cross-validation [J].
Amari, S ;
Murata, N ;
Muller, KR ;
Finke, M ;
Yang, HH .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (05) :985-996
[9]   Impact of the length of observed records on the performance of ANN and of conceptual parsimonious rainfall-runoff forecasting models [J].
Anctil, F ;
Perrin, C ;
Andréassian, V .
ENVIRONMENTAL MODELLING & SOFTWARE, 2004, 19 (04) :357-368
[10]   Ann output updating of lumped conceptual rainfall/runoff forecasting models [J].
Anctil, F ;
Perrin, C ;
Andréassian, V .
JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, 2003, 39 (05) :1269-1279