One in a million: picking the right patterns

被引:20
作者
Bringmann, Bjorn [1 ]
Zimmermann, Albrecht [1 ]
机构
[1] Katholieke Univ Leuven, Dept Computerwetenschappen, B-3001 Heverlee, Belgium
关键词
Data mining; Post processing; Pattern reduction;
D O I
10.1007/s10115-008-0136-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms-Bouncer and Picker-for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.
引用
收藏
页码:61 / 81
页数:21
相关论文
共 14 条
[1]   On condensed representations of constrained frequent patterns [J].
Bonchi, F ;
Lucchese, C .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (02) :180-201
[2]  
BORGELT C, 2004, FIMI 04, V126
[3]   Mining free itemsets under constraints [J].
Boulicaut, JF ;
Jeudy, B .
2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, :322-329
[4]  
BRINGMANN B, 2005, PKDD, P46
[5]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74
[6]   Summarization - compressing data into an informative representation [J].
Chandola, Varun ;
Kumar, Vipin .
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (03) :355-378
[7]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[8]  
Landwehr N., 2006, AAAI
[9]  
Lavrac N, 2004, LECT NOTES ARTIF INT, V3848, P243
[10]  
LENT B, 1997, ICDE, P220