Consistency-based search in feature selection

被引:632
作者
Dash, M
Liu, HA
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Arizona State Univ, Dept Comp Sci & Engn, Tempe, AZ 85287 USA
关键词
classification; feature selection; evaluation measures; search strategies; random search; branch and bound;
D O I
10.1016/S0004-3702(03)00079-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an "optimal" subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the Goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, Give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:155 / 176
页数:22
相关论文
共 46 条
[1]   LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].
ALMUALLIM, H ;
DIETTERICH, TG .
ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305
[2]  
[Anonymous], 1994, FEATURE SELECTION ME
[3]  
[Anonymous], [No title captured]
[4]  
[Anonymous], 1990, P 10 INT C PATT REC, DOI DOI 10.1109/ICPR.1990.118160
[5]  
[Anonymous], P 9 INT C MACH LEARN
[6]   A formalism for relevance and its application in feature subset selection [J].
Bell, DA ;
Wang, H .
MACHINE LEARNING, 2000, 41 (02) :175-195
[7]  
Ben-Bassat M., 1982, Handbook of statistics, V2, P773, DOI DOI 10.1016/S0169-7161(82)02038-0
[8]  
Blake C.L., 1998, UCI repository of machine learning databases
[9]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[10]  
Blumer A., 1990, Readings in machine learning, P201, DOI DOI 10.1002/0471721182.CH1