BruteSuppression: a size reduction method for Apriori rule sets

被引:14
作者
Hills, Jon [1 ]
Bagnall, Anthony [1 ]
de la Iglesia, Beatriz [1 ]
Richards, Graeme [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
关键词
Apriori; Data mining; Interestingness; Partial classification; Rules; INTERESTINGNESS MEASURES; DISCOVERY; DATABASES;
D O I
10.1007/s10844-012-0232-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Association rule mining can provide genuine insight into the data being analysed; however, rule sets can be extremely large, and therefore difficult and time-consuming for the user to interpret. We propose reducing the size of Apriori rule sets by removing overlapping rules, and compare this approach with two standard methods for reducing rule set size: increasing the minimum confidence parameter, and increasing the minimum antecedent support parameter. We evaluate the rule sets in terms of confidence and coverage, as well as two rule interestingness measures that favour rules with antecedent conditions that are poor individual predictors of the target class, as we assume that these represent potentially interesting rules. We also examine the distribution of the rules graphically, to assess whether particular classes of rules are eliminated. We show that removing overlapping rules substantially reduces rule set size in most cases, and alters the character of a rule set less than if the standard parameters are used to constrain the rule set to the same size. Based on our results, we aim to extend the Apriori algorithm to incorporate the suppression of overlapping rules.
引用
收藏
页码:431 / 454
页数:24
相关论文
共 27 条
[1]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[2]  
[Anonymous], ACM SIGMOD RECORD
[3]  
[Anonymous], P VLDB
[4]  
Balcazar J., 2009, WORKSH QUAL ISS MEAS, V9
[5]  
Bayardo R.J., 1999, Proc. of the Fifth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, P145, DOI [DOI 10.1145/312129.312219, 10.1145/3121312219]
[6]   Constraint-based rule mining in large, dense databases [J].
Bayardo, RJ ;
Agrawal, R ;
Gunopulos, D .
DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) :217-240
[7]   Finding interesting associations without support pruning [J].
Cohen, E ;
Datar, M ;
Fujiwara, S ;
Gionis, A ;
Indyk, P ;
Motwani, R ;
Ullman, JD ;
Yang, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (01) :64-78
[8]   The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification [J].
de la Iglesia, B ;
Richards, G ;
Philpott, MS ;
Rayward-Smith, VJ .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 169 (03) :898-917
[9]   On rule interestingness measures [J].
Freitas, AA .
KNOWLEDGE-BASED SYSTEMS, 1999, 12 (5-6) :309-315
[10]  
FUKUDA T, 1996, ACM SIGMOD RECORD, V25, P13