A two-stage methodology for sequence classification based on sequential pattern mining and optimization

被引:34
作者
Exarchos, Themis P. [1 ,2 ,3 ]
Tsipouras, Markos G. [1 ]
Papaloukas, Costas [4 ]
Fotiadis, Dimitrios I. [1 ,3 ]
机构
[1] Univ Ioannina, Dept Comp Sci, Unit Med Technol & Intelligent Informat Syst, GR-45110 Ioannina, Greece
[2] Univ Ioannina, Sch Med, Dept Med Phys, GR-45110 Ioannina, Greece
[3] CERETETH, Inst Biomed Technol, GR-41222 Larisa, Greece
[4] Univ Ioannina, Dept Biol Applicat & Technol, GR-45110 Ioannina, Greece
关键词
sequential pattern mining; sequential pattern matching; sequence classification;
D O I
10.1016/j.datak.2008.05.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a methodology for sequence classification, which employs sequential pattern mining and optimization, in a two-stage process. In the first stage, a sequence classification model is defined, based on a set of sequential patterns and two sets of weights are introduced, one for the patterns and one for classes. In the second stage, an optimization technique is employed to estimate the weight values and achieve optimal classification accuracy. Extensive evaluation of the methodology is carried out, by varying the number of sequences, the number of patterns and the number of classes and it is compared with similar sequence classification approaches. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:467 / 487
页数:21
相关论文
共 42 条
[21]   Mining minimal distinguishing subsequence patterns with gap constraints [J].
Ji, Xiaonan ;
Bailey, James ;
Dong, Guozhu .
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 11 (03) :259-286
[22]   Benchmarking the effectiveness of sequential pattern mining methods [J].
Kum, Hye-Chung ;
Chang, Joong Hyuk ;
Wang, Wei .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :30-50
[23]   Sequence-based protein structure prediction using a reduced state-space hidden Markov model [J].
Lampros, Christos ;
Papaloukas, Costas ;
Exarchos, Themis P. ;
Goletsis, Yorgos ;
Fotiadis, Dimitrios I. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2007, 37 (09) :1211-1224
[24]   Scalable feature mining for sequential data [J].
Lesh, N ;
Zaki, MJ ;
Ogihara, M .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 2000, 15 (02) :48-56
[25]  
LESH N, 1999, 5 ACM SIGKDD INT C K, P342
[26]  
LI M, P 17 IEEE INT C TOOL
[27]  
Liu B, 1998, P 4 INT C KNOWL DISC, P80
[28]  
Loewenster D. M., 1998, P PAC S BIOT, P667
[29]  
Mehta S, 2005, DATA KNOWL ENG, V53, P31, DOI 10.1016/j.datak.2004.06.012
[30]   SCOP - A STRUCTURAL CLASSIFICATION OF PROTEINS DATABASE FOR THE INVESTIGATION OF SEQUENCES AND STRUCTURES [J].
MURZIN, AG ;
BRENNER, SE ;
HUBBARD, T ;
CHOTHIA, C .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 247 (04) :536-540