Entropy-based gene ranking without selection bias for the predictive classification of microarray data

被引:98
作者
Furlanello, C [1 ]
Serafini, M [1 ]
Merler, S [1 ]
Jurman, G [1 ]
机构
[1] ITC Irst, Trento, Italy
关键词
D O I
10.1186/1471-2105-4-54
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). Results: With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions: Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
引用
收藏
页数:20
相关论文
共 28 条
[11]   Classification and prediction of survival in patients with the leukemic phase of cutaneous T cell lymphoma [J].
Kari, L ;
Loboda, A ;
Nebozhyn, M ;
Rook, AH ;
Vonderheid, EC ;
Nichols, C ;
Virok, D ;
Chang, C ;
Horng, WH ;
Johnston, J ;
Wysocka, M ;
Showe, MK ;
Showe, LC .
JOURNAL OF EXPERIMENTAL MEDICINE, 2003, 197 (11) :1477-1488
[12]   Zipf's law in importance of genes for cancer classification using microarray data [J].
Li, WT ;
Yang, YN .
JOURNAL OF THEORETICAL BIOLOGY, 2002, 219 (04) :539-551
[13]   Bayesian automatic relevance determination algorithms for classifying gene expression data [J].
Li, Y ;
Campbell, C ;
Tipping, M .
BIOINFORMATICS, 2002, 18 (10) :1332-1339
[14]   Tumor classification by partial least squares using microarray gene expression data [J].
Nguyen, DV ;
Rocke, DM .
BIOINFORMATICS, 2002, 18 (01) :39-50
[15]  
Nutt CL, 2003, CANCER RES, V63, P1602
[16]   A molecular signature of metastasis in primary solid tumors [J].
Ramaswamy, S ;
Ross, KN ;
Lander, ES ;
Golub, TR .
NATURE GENETICS, 2003, 33 (01) :49-54
[17]   Multiclass cancer diagnosis using tumor gene expression signatures [J].
Ramaswamy, S ;
Tamayo, P ;
Rifkin, R ;
Mukherjee, S ;
Yeang, CH ;
Angelo, M ;
Ladd, C ;
Reich, M ;
Latulippe, E ;
Mesirov, JP ;
Poggio, T ;
Gerald, W ;
Loda, M ;
Lander, ES ;
Golub, TR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :15149-15154
[18]   Basic microarray analysis: grouping and feature reduction [J].
Raychaudhuri, S ;
Sutphin, PD ;
Chang, JT ;
Altman, RB .
TRENDS IN BIOTECHNOLOGY, 2001, 19 (05) :189-193
[19]   Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification [J].
Simon, R ;
Radmacher, MD ;
Dobbin, K ;
McShane, LM .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2003, 95 (01) :14-18
[20]   From patterns to pathways: gene expression data analysis comes of age [J].
Slonim, DK .
NATURE GENETICS, 2002, 32 (Suppl 4) :502-508