Feature selection, mutual information, and the classification of high-dimensional patterns

被引:53
作者
Bonev, Boyan [1 ]
Escolano, Francisco [1 ]
Cazorla, Miguel [1 ]
机构
[1] Univ Alicante, Dept Ciencia Computac & Inteligencia Arttificial, E-03080 Alicante, Spain
关键词
filter feature selection; mutual information; entropic spanning graphs; microarray;
D O I
10.1007/s10044-008-0107-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel feature selection filter for supervised learning, which relies on the efficient estimation of the mutual information between a high-dimensional set of features and the classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Renyi entropy, and the subsequent approximation of the Shannon entropy. Thus, the complexity does not depend on the number of dimensions but on the number of patterns/samples, and the curse of dimensionality is circumvented. We show that it is then possible to outperform algorithms which individually rank features, as well as a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification. For most of the tested data sets, we obtain better classification results than those reported in the literature.
引用
收藏
页码:309 / 319
页数:11
相关论文
共 37 条
[1]   Classifier-independent feature selection on the basis of divergence criterion [J].
Abe, Naoto ;
Kudo, Mineichi ;
Toyama, Jun ;
Shimbo, Masaru .
PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) :127-137
[2]  
BEIRLANT E, 1996, INT J MATH STAT SCI, V6, P17
[3]   AN ASYMPTOTIC DETERMINATION OF THE MINIMUM SPANNING TREE AND MINIMUM MATCHING CONSTANTS IN GEOMETRICAL-PROBABILITY [J].
BERTSIMAS, DJ ;
VANRYZIN, G .
OPERATIONS RESEARCH LETTERS, 1990, 9 (04) :223-231
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]  
CARMICHAEL O, 2002, CMURIT0209
[6]  
CHANG P, 1999, IEEE C COMP VIS PATT
[7]  
Cover TM, 2006, Elements of Information Theory
[8]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   VISUAL-PATTERN RECOGNITION IN DROSOPHILA INVOLVES RETINOTOPIC MATCHING [J].
DILL, M ;
WOLF, R ;
HEISENBERG, M .
NATURE, 1993, 365 (6448) :751-753
[10]   Object recognition and pose estimation using color cooccurrence histograms and geometric modeling [J].
Ekvall, S ;
Kragic, D ;
Hoffmann, F .
IMAGE AND VISION COMPUTING, 2005, 23 (11) :943-955