Mixed feature selection based on granulation and approximation

被引:213
作者
Hu, Qinghua [1 ]
Liu, Jinfu [1 ]
Yu, Daren [1 ]
机构
[1] Harbin Inst Technol, Harbin 150001, Peoples R China
关键词
feature selection; numerical feature; categorical feature; delta neighborhood; k-nearest-neighbor; rough sets;
D O I
10.1016/j.knosys.2007.07.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak's rough set model into delta neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with delta neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:294 / 304
页数:11
相关论文
共 43 条
[1]   LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].
ALMUALLIM, H ;
DIETTERICH, TG .
ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305
[2]  
[Anonymous], 2000, CORRELATION BASED FE
[3]   On the compact computational domain of fuzzy-rough sets [J].
Bhatt, RB ;
Gopal, M .
PATTERN RECOGNITION LETTERS, 2005, 26 (11) :1632-1640
[4]   On fuzzy-rough sets approach to feature selection [J].
Bhatt, RB ;
Gopal, M .
PATTERN RECOGNITION LETTERS, 2005, 26 (07) :965-975
[5]  
Brassard G, 1996, FUNDAMENTALS ALGORIT
[6]  
Dash M., 1997, Intelligent Data Analysis, V1
[7]   Consistency-based search in feature selection [J].
Dash, M ;
Liu, HA .
ARTIFICIAL INTELLIGENCE, 2003, 151 (1-2) :155-176
[8]   Feature selection via set cover [J].
Dash, M .
1997 IEEE KNOWLEDGE AND DATA ENGINEERING EXCHANGE WORKSHOP, PROCEEDINGS, 1997, :165-171
[9]  
Dash M, 1998, LECT NOTES ARTIF INT, V1531, P238
[10]  
Dietterich TG, 1997, AI MAG, V18, P97