On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

被引:58
作者
Cano, JR
Herrera, F [1 ]
Lozano, M
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
[2] Univ Jaen, Dept Comp Sci, Jaen 23071, Spain
关键词
evolutionary algorithms; stratification; instance selection; training set selection; data mining;
D O I
10.1016/j.asoc.2005.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a new approach for training set selection in large size data sets. The algorithm consists on the combination of stratification and evolutionary algorithms. The stratification reduces the size of domain where the selection is applied while the evolutionary method selects the most representative instances. The performance of the proposal is compared with seven non- evolutionary algorithms, in stratified execution. The analysis follows two evaluating approaches: balance between reduction and accuracy of the subsets selected, and balance between interpretability and accuracy of the representation models associated to these subsets. The algorithms have been assessed on large and huge size data sets. The study shows that the stratified evolutionary instance selection consistently outperforms the non- evolutionary ones. The main advantages are: high instance reduction rates, high classification accuracy and models with high interpretability. (C) 2005 Elsevier B. V. All rights reserved.
引用
收藏
页码:323 / 332
页数:10
相关论文
共 33 条
[1]  
[Anonymous], P 4 INT WORKSH MACH
[2]  
Back T., 1997, Handbook of evolutionary computation
[3]  
BLAKE C, 1998, UCI RESPOSITORY MACH
[4]   Advances in instance selection for instance-based learning algorithms [J].
Brighton, H ;
Mellish, C .
DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) :153-172
[5]   Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study [J].
Cano, JR ;
Herrera, F ;
Lozano, M .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2003, 7 (06) :561-575
[6]  
CANO JR, UNPUB PATTERN REC LE
[7]   Design of nearest neighbor classifiers using an intelligent multi-objective evolutionary algorithm [J].
Chen, JH ;
Chen, HM ;
Ho, SY .
PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3157 :262-271
[8]  
Eshelman L. J., 1991, FDN GENETIC ALGORITH, V1, P265, DOI DOI 10.1016/B978-0-08-050684-5.50020-3
[9]  
Freitas A.A., 2002, NAT COMP SER
[10]   Building decision trees with constraints [J].
Garofalakis, M ;
Hyun, DJ ;
Rastogi, R ;
Shim, K .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (02) :187-214