Stability of feature selection algorithms: a study on high-dimensional spaces

被引:482
作者
Kalousis, Alexandros [1 ]
Prados, Julien [1 ]
Hilario, Melanie [1 ]
机构
[1] Univ Geneva, Dept Comp Sci, CH-1211 Geneva 4, Switzerland
关键词
feature selection; high dimensionality; feature stability;
D O I
10.1007/s10115-006-0040-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.
引用
收藏
页码:95 / 116
页数:22
相关论文
共 17 条
[1]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]
[Anonymous], 1993, P 13 INT JOINT C ART
[3]
Domingos P, 2000, SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), P564
[4]
Duda RO, 2006, PATTERN CLASSIFICATI
[5]
NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58
[6]
Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[7]
Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[8]
Benchmarking attribute selection techniques for discrete class data mining [J].
Hall, MA ;
Holmes, G .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) :1437-1447
[9]
METIS: multiple extraction techniques for informative sentences [J].
Mitchell, AL ;
Divoli, A ;
Kim, JH ;
Hilario, M ;
Selimas, I ;
Attwood, TK .
BIOINFORMATICS, 2005, 21 (22) :4196-4197
[10]
Pedro D., 2000, P 17 INT C MACH LEAR, P231, DOI DOI 10.5555/645529.657784