Classification of time series by shapelet transformation

被引:504
作者
Hills, Jon [1 ]
Lines, Jason [1 ]
Baranauskas, Edgaras [1 ]
Mapp, James [1 ]
Bagnall, Anthony [1 ]
机构
[1] Univ E Anglia, Norwich NR4 7TJ, Norfolk, England
关键词
STOCKS;
D O I
10.1007/s10618-013-0322-1
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Time-series classification (TSC) problems present a specific challenge for classification algorithms: how to measure similarity between series. A shapelet is a time-series subsequence that allows for TSC based on local, phase-independent similarity in shape. Shapelet-based classification uses the similarity between a shapelet and a series as a discriminatory feature. One benefit of the shapelet approach is that shapelets are comprehensible, and can offer insight into the problem domain. The original shapelet-based classifier embeds the shapelet-discovery algorithm in a decision tree, and uses information gain to assess the quality of candidates, finding a new shapelet at each node of the tree through an enumerative search. Subsequent research has focused mainly on techniques to speed up the search. We examine how best to use the shapelet primitive to construct classifiers. We propose a single-scan shapelet algorithm that finds the best shapelets, which are used to produce a transformed dataset, where each of the features represent the distance between a time series and a shapelet. The primary advantages over the embedded approach are that the transformed data can be used in conjunction with any classifier, and that there is no recursive search for shapelets. We demonstrate that the transformed data, in conjunction with more complex classifiers, gives greater accuracy than the embedded shapelet tree. We also evaluate three similarity measures that produce equivalent results to information gain in less time. Finally, we show that by conducting post-transform clustering of shapelets, we can enhance the interpretability of the transformed data. We conduct our experiments on 29 datasets: 17 from the UCR repository, and 12 we provide ourselves.
引用
收藏
页码:851 / 881
页数:31
相关论文
共 42 条
[1]
[Anonymous], 1949, The Mathematical Theory of Communication
[2]
Bagnall A, 2012, P 12 SIAM C DAT MIN
[3]
Bagnall A, 2012, SHAPELET BASED TIME
[4]
Batista G, 2011, P 11 SIAM C DAT MIN
[5]
MPEG-7 visual shape descriptors [J].
Bober, M .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (06) :716-719
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
Buza K, 2011, THESIS U HILDESHEIM
[8]
STOCK DISCRIMINATION USING OTOLITH SHAPE-ANALYSIS [J].
CAMPANA, SE ;
CASSELMAN, JM .
CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES, 1993, 50 (05) :1062-1083
[9]
SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[10]
ON THE SEGMENTATION AND CLASSIFICATION OF HAND RADIOGRAPHS [J].
Davis, Luke M. ;
Theobald, Barry-John ;
Lines, Jason ;
Toms, Andoni ;
Bagnall, Anthony .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2012, 22 (05)