A two-stage methodology for sequence classification based on sequential pattern mining and optimization

被引:34
作者
Exarchos, Themis P. [1 ,2 ,3 ]
Tsipouras, Markos G. [1 ]
Papaloukas, Costas [4 ]
Fotiadis, Dimitrios I. [1 ,3 ]
机构
[1] Univ Ioannina, Dept Comp Sci, Unit Med Technol & Intelligent Informat Syst, GR-45110 Ioannina, Greece
[2] Univ Ioannina, Sch Med, Dept Med Phys, GR-45110 Ioannina, Greece
[3] CERETETH, Inst Biomed Technol, GR-41222 Larisa, Greece
[4] Univ Ioannina, Dept Biol Applicat & Technol, GR-45110 Ioannina, Greece
关键词
sequential pattern mining; sequential pattern matching; sequence classification;
D O I
10.1016/j.datak.2008.05.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a methodology for sequence classification, which employs sequential pattern mining and optimization, in a two-stage process. In the first stage, a sequence classification model is defined, based on a set of sequential patterns and two sets of weights are introduced, one for the patterns and one for classes. In the second stage, an optimization technique is employed to estimate the weight values and achieve optimal classification accuracy. Extensive evaluation of the methodology is carried out, by varying the number of sequences, the number of patterns and the number of classes and it is compared with similar sequence classification approaches. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:467 / 487
页数:21
相关论文
共 42 条
[1]  
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[3]  
[Anonymous], 1999, Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining p, DOI [10.1145/312129., DOI 10.1145/312129, 10.1145/312129, 10.1145/312129.312191]
[4]  
Ayres J., 2002, Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining, P429, DOI 10.1145/775047.775109
[5]   Detecting group differences: Mining contrast sets [J].
Bay, SD ;
Pazzani, MJ .
DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) :213-246
[6]  
Bayardo R. J. Jr., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P123
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Extending the state-of-the-art of constraint-based pattern discovery [J].
Bonchi, Francesco ;
Lucchese, Claudio .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (02) :377-399
[9]  
Chakrabartty S, 2002, LECT NOTES COMPUT SC, V2388, P278
[10]   Mining frequent tree-like patterns in large datasets [J].
Chen, Tzung-Shi ;
Hsu, Shih-Chun .
DATA & KNOWLEDGE ENGINEERING, 2007, 62 (01) :65-83