Optimal ensemble averaging of neural networks

被引:157
作者
Naftaly, U [1 ]
Intrator, N [1 ]
Horn, D [1 ]
机构
[1] TEL AVIV UNIV,RAYMOND & BEVERLY SACKLER FAC EXACT SCI,SCH MATH SCI,IL-69978 TEL AVIV,ISRAEL
关键词
D O I
10.1088/0954-898X/8/3/004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on an observation about the different effect of ensemble averaging on the bias and variance portions of the prediction error, we discuss training methodologies for ensembles of networks. We demonstrate the effect of variance reduction and present a method of extrapolation to the limit of an infinite ensemble. A significant reduction of variance is obtained by averaging just over initial conditions of the neural networks, without varying architectures or training sets. The minimum of the ensemble prediction error is reached later than that of a single network. In the vicinity of the minimum, the ensemble prediction error appears to be flatter than that of the single network, thus simplifying optimal stopping decision. The results are demonstrated on sunspots data, where the predictions are among the best obtained, and on the 1993 energy prediction competition data set B.
引用
收藏
页码:283 / 296
页数:14
相关论文
共 19 条
[1]  
[Anonymous], 2018, TIME SERIES PREDICTI
[2]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[3]   LEARNING THE HIDDEN STRUCTURE OF SPEECH [J].
ELMAN, JL ;
ZIPSER, D .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (04) :1615-1626
[4]   NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58
[5]  
HERTZ J, 1991, LECT NOTES SANTA FE, V1
[6]  
HINTON GE, 1986, P 8 ANN C COGN SCI S, P12
[7]  
LINCOLN WP, 1990, NEURAL INFORMATION P, V2, P650
[8]  
MacKay D, 1994, ASHRAE T, V100, P1053
[9]  
MORRIS J, 1977, J R STAT SOC A, V140, P437
[10]   SIMPLIFYING NEURAL NETWORKS BY SOFT WEIGHT-SHARING [J].
NOWLAN, SJ ;
HINTON, GE .
NEURAL COMPUTATION, 1992, 4 (04) :473-493