Feature selection algorithms: A survey and experimental evaluation

被引:359
作者
Molina, LC [1 ]
Belanche, L [1 ]
Nebot, A [1 ]
机构
[1] Univ Politecn Catalunya, Dept Llenguatges & Sistemes Informat, ES-08034 Barcelona, Spain
来源
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2002年
关键词
D O I
10.1109/ICDM.2002.1183917
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enables to adequately decide which algorithm to use in certain situations. This work assesses the performance of several fundamental algorithms found in the literature in a controlled scenario. A scoring measure ranks the algorithms by taking into account the amount of relevance, irrelevance and redundance on sample data sets. This measure computes the degree of matching between the output given by the algorithm and the known optimal solution. Sample size effects are also studied.
引用
收藏
页码:306 / 313
页数:8
相关论文
共 20 条
[1]   LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].
ALMUALLIM, H ;
DIETTERICH, TG .
ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305
[2]  
[Anonymous], [No title captured]
[3]  
Back T, 1996, EVOLUTIONARY ALGORIT
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]  
CARUANA RA, 1994, USEFUL ITS RELEVANCE
[6]  
Caruna R, 1994, P 11 INT C MACH LEAR, P28
[7]  
CHANDON S, 1981, ANAL TYPOLOGIQUE
[8]  
Devijver P., 1982, PATTERN RECOGN
[9]  
Doak J., 1992, Technical Report CSE-92-18
[10]  
Hall MA, 1998, Correlation-based feature subset selection for machine learning