Time series analysis and prediction using gated experts with application to energy demand forecasts

被引:14
作者
Weigend, AS
机构
[1] Information Systems Department, Stern School of Business, New York University, New York
基金
美国国家科学基金会;
关键词
D O I
10.1080/088395196118443
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes) and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert, given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e., on the output of the gating network at the previous time step), as well as to averaging over several predictors. In contrast, gated experts soft-partition the input space. This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes, (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state), and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several timescales. The main results are (I) the gating network correctly discovers the different regimes of the process, (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the subprocesses), and (3) there is less overfitting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity of the data.
引用
收藏
页码:583 / 624
页数:42
相关论文
共 67 条
[1]  
[Anonymous], 2018, TIME SERIES PREDICTI
[2]   SMOOTH ONLINE LEARNING ALGORITHMS FOR HIDDEN MARKOV-MODELS [J].
BALDI, P ;
CHAUVIN, Y .
NEURAL COMPUTATION, 1994, 6 (02) :307-318
[3]  
Bengio Y., 1995, Advances in Neural Information Processing Systems 7, P427
[4]  
Bishop C.M, 1994, MIXTURE DENSITY NETW, P1050
[5]   GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSKEDASTICITY [J].
BOLLERSLEV, T .
JOURNAL OF ECONOMETRICS, 1986, 31 (03) :307-327
[6]   ARCH MODELING IN FINANCE - A REVIEW OF THE THEORY AND EMPIRICAL-EVIDENCE [J].
BOLLERSLEV, T ;
CHOU, RY ;
KRONER, KF .
JOURNAL OF ECONOMETRICS, 1992, 52 (1-2) :5-59
[7]  
Bridle J.S., 1990, NEUROCOMPUTING, P227, DOI DOI 10.1007/978-3-642-76153-9_28
[8]  
Broomhead D. S., 1988, Complex Systems, V2, P321
[9]  
Buntine W. L., 1991, Complex Systems, V5, P603
[10]  
CACCIATORE TW, 1994, ADV NEURAL INFORMATI, V6, P719