The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification

被引:25
作者
de la Iglesia, B [1 ]
Richards, G [1 ]
Philpott, MS [1 ]
Rayward-Smith, VJ [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
基金
英国工程与自然科学研究理事会;
关键词
data mining; multi-objective metaheuristics; rule extraction; association rule discovery;
D O I
10.1016/j.ejor.2004.08.025
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:898 / 917
页数:20
相关论文
共 38 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[3]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[4]  
[Anonymous], 1998, UCI REPOSITORY MACHI
[5]  
[Anonymous], 1999, P IEEE C EVOLUTIONAR, DOI DOI 10.1109/CEC.1999.781913
[6]  
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]
[7]   Constraint-based rule mining in large, dense databases [J].
Bayardo, RJ ;
Agrawal, R ;
Gunopulos, D .
DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) :217-240
[8]  
Biggs D., 1991, J APPL STAT, V18, P49, DOI [DOI 10.1080/02664769100000005, 10.1080/02664769100000005]
[9]  
Breiman L., 1998, CLASSIFICATION REGRE
[10]  
BRIN S, 1999, P 5 ACM SIGKDD INT C, P135