Using support vector machines for time series prediction

被引:298
作者
Thiessen, U
van Brakel, R
de Weijer, AP
Melssen, WJ
Buydens, LMC
机构
[1] Univ Nijmegen, Analyt Chem Lab, NL-6525 ED Nijmegen, Netherlands
[2] Teijin Twaron Res Inst, NL-6700 TC Arnhem, Netherlands
关键词
time series; process chemometries; SVM-support vector machines; ARMA-autoregressive moving average; RNN-recurrent neural networks;
D O I
10.1016/S0169-7439(03)00111-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Time series prediction can be a very useful tool in the field of process chemometrics to forecast and to study the behaviour of key process parameters in time. This creates the possibility to give early warnings of possible process malfunctioning. In this paper, time series prediction is performed by support vector machines (SVMs), Elman recurrent neural networks, and autoregressive moving average (ARMA) models. A comparison of these three methods is made based on their predicting ability. In the field of chemometrics, SVMs are hardly used even though they have many theoretical advantages for both classification and regression tasks. These advantages stem from the specific formulation of a (convex) objective function with constraints which is solved using Lagrange Multipliers and has the characteristics that: (1) a global optimal solution exists which will be found, (2) the result is a general solution avoiding overtraining, (3) the solution is sparse and only a limited set of training points contribute to this solution, and (4) nonlinear solutions can be calculated efficiently due to the usage of inner products. The method comparison is performed on two simulated data sets and one real-world industrial data set. The simulated data sets are a data set generated according to the ARMA principles and the Mackey-Glass data set, often used for benchmarking. The first data set is relatively easy whereas the second data set is a more difficult nonlinear chaotic data set. The real-world data set stems from a filtration unit in a yam spinning process and it contains differential pressure values. These values are a measure of the contamination level of a filter. For practical purposes, it is very important to predict these values accurately because they play a crucial role in maintaining the quality of the process considered. As it is expected, it appears that the ARMA model performs best for the ARMA data set while the SVM and the Elman networks perforin similarly. For the more difficult benchmark data set the SVM outperforms the ARMA model and in most of the cases outperforms the best of several Elman neural networks. For the real-world set, the SVM was trained using a training set containing only one tenth of the points of the original training set which was used for the other methods. This was done to test its performance if only few data would be available. Using the same test set for all methods, it appeared that prediction results were equally well for both the SVM and the ARMA model whereas the Elman network could not be used to predict these series. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:35 / 49
页数:15
相关论文
共 21 条
[1]  
[Anonymous], LNCS, DOI DOI 10.1007/BFB0020283
[2]  
[Anonymous], 2001, NV2TR1998030 MATH WO
[3]  
[Anonymous], 2002, Least Squares Support Vector Machines
[4]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[5]   A flexible classification approach with optimal generalisation performance: support vector machines [J].
Belousov, AI ;
Verzakov, SA ;
von Frese, J .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 64 (01) :15-25
[6]  
Box GEP., 1976, TIME SERIES ANAL FOR
[7]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[8]   Kernel methods: a survey of current techniques [J].
Campbell, C .
NEUROCOMPUTING, 2002, 48 :63-84
[9]  
CHATFIELD C, 1980, ANAL TIMES SERIES IN
[10]  
Cristianini N, 2000, Intelligent Data Analysis: An Introduction