Multiple SVM-RFE for gene selection in cancer classification with expression data

被引:328
作者
Duan, KB
Rajapakse, JC [1 ]
Wang, HY
Azuaje, F
机构
[1] Nanyang Technol Univ, Sch Comp Engn, BioInformat Res Ctr, Singapore 639798, Singapore
[2] Univ Ulster, Sch Comp & Math, Jordanstown, North Ireland
[3] Univ Ulster, Comp Sci Res Inst, Jordanstown, North Ireland
基金
英国医学研究理事会;
关键词
cancer classification; feature selection; gene expression; gene ontology; semantic similarity; support vector machine recursive feature elimination (SVM-RFE);
D O I
10.1109/TNB.2005.853657
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper proposes a new feature selection method that uses a backward elimination procedure similar to that implemented in support vector machine recursive feature elimination (SVM-RFE). Unlike the SVM-RFE method, at each step, the proposed approach computes the feature ranking score from a statistical analysis of weight vectors of multiple linear SVMs trained on subsamples of the original training data. We tested the proposed method on four gene expression datasets for cancer classification. The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy. A Gene Ontology-based similarity assessment indicates that the selected subsets are functionally diverse, further validating our gene selection method. This investigation also suggests that, for gene expression-based cancer classification, average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.
引用
收藏
页码:228 / 234
页数:7
相关论文
共 20 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], 1990, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2002, KENT RIDGE BIOMEDICA
[4]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[5]  
Boser B. E., 1992, P 5 ANN WORKSH COMP, P114
[6]   Evaluation of simple performance measures for tuning SVM hyperparameters [J].
Duan, K ;
Keerthi, SS ;
Poo, AN .
NEUROCOMPUTING, 2003, 51 :41-59
[7]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]  
*GEN ONT CONS, 2001, GENOME RES, V11, P1245
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422