Feature selection based on a modified fuzzy C-means algorithm with supervision

被引:36
作者
Marcelloni, F [1 ]
机构
[1] Univ Pisa, Dipartimento Ingn Informaz Elettr Informat Teleco, I-56122 Pisa, Italy
关键词
feature selection; fuzzy C-means; k-nearest neighbors; supervised learning;
D O I
10.1016/S0020-0255(02)00402-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose a new approach to feature selection based on a modified fuzzy C-means algorithm with supervision (MFCMS). MFCMS completes the unsupervised learning of classical fuzzy C-means with labeled patterns. The labeled patterns allow MFCMS to accurately model the shape of each cluster and consequently to highlight the features which result to be particularly effective to characterize a cluster. These features are distinguished by a low variance of their values for the patterns with a high membership degree to the cluster. If, with respect to these features, the distance between the prototype of the cluster and the prototypes of the other clusters is high, then these features have the property of discriminating between the cluster and the other clusters. To take these two aspects into account, for each cluster and each feature, we introduce a purposely defined index: the higher the value of the index, the higher the discrimination capability of the feature for the cluster. We execute MFCMS on the training set considering all patterns as labeled. Then, we retain the features which are associated, at least for one cluster, with an index larger than a threshold T. We applied MFCMS to several real-world pattern classification benchmarks. We used the well-known k-nearest neighbors as learning algorithm. We show that feature selection performed by MFCMS achieved an improvement in generalization on all data sets. (C) 2002 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:201 / 226
页数:26
相关论文
共 42 条
[1]   LEARNING BOOLEAN CONCEPTS IN THE PRESENCE OF MANY IRRELEVANT FEATURES [J].
ALMUALLIM, H ;
DIETTERICH, TG .
ARTIFICIAL INTELLIGENCE, 1994, 69 (1-2) :279-305
[2]  
Almuallim H, 1992, P 9 NAT C ART INT, P547
[3]  
[Anonymous], 1999, Fuzzy Cluster Analysis
[4]  
[Anonymous], ADV FUZZY SET THEORY
[5]  
[Anonymous], ARTIFICIAL INTELLIGE
[6]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[7]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[8]  
Bezdek J., 1999, FUZZY MODELS ALGORIT
[9]  
Caruna R, 1994, P 11 INT C MACH LEAR, P28
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+