Distance measures for effective clustering of ARIMA time-series

被引:225
作者
Kalpakis, K [1 ]
Gada, D [1 ]
Puttagunta, V [1 ]
机构
[1] Univ Maryland Baltimore Cty, CSEE Dept, Baltimore, MD 21250 USA
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
time-series; similarity measures; clustering; ARIMA models; cepstral coefficients;
D O I
10.1109/ICDM.2001.989529
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many environmental and socioeconomic time-series data can be adequately modeled using Auto-Regressive Integrated Moving Average (ARIMA) models. We call such time-series ARIMA time-series. We consider the problem of clustering ARIMA time-series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time-series for clustering ARIMA time-series, by using the Euclidean distance between the LPC cepstra of two time-series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time-series. For example, few LPC cepstral coefficients are sufficient in order to discriminate between time-series that are modeled by different ARIMA models. In fact this approach requires fewer coefficients than traditional approaches, such as DFT and DWT The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time-series using the Partition Around Medoids method with various similarity measures. We present experimental results demonstrating that using the proposed measure we achieve significantly better clusterings of ARIMA time-series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT DWT PCA, etc. Experiments were performed both on simulated as well as real data.
引用
收藏
页码:273 / 280
页数:8
相关论文
共 18 条
[1]  
AGRAWAL R, 1995, 21 INT C VER LARG DA, P490
[2]  
Agrawal R., 1993, P 4 INT C FDN DAT OR, V730, P69
[3]  
[Anonymous], 1979, SPATIAL TIME SERIES
[4]  
[Anonymous], 1989, DIGITAL SPEECH PROCE
[5]  
[Anonymous], P ACM SIG MOD INT C
[6]  
Das G, 1997, LECT NOTES ARTIF INT, V1263, P88
[7]  
Gavrilov M., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P487, DOI 10.1145/347090.347189
[8]  
Gordon A, 1999, Classification
[9]  
Jagadish H. V., 1995, Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1995, P36, DOI 10.1145/212433.212444
[10]  
KALPAKIS K, 2001, TRCS0114 CSEE UMBC