Improved binary PSO for feature selection using gene expression data

被引:404
作者
Chuang, Li-Yeh [2 ]
Chang, Hsueh-Wei [3 ,4 ]
Tu, Chung-Jui [1 ]
Yang, Cheng-Hong [1 ]
机构
[1] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, Kaohsiung 807, Taiwan
[2] I Shou Univ, Dept Chem Engn, Kaohsiung 840, Taiwan
[3] Kaohsiung Med Univ, Coll Pharm, Dept Biomed Sci & Environm Biol, Kaohsiung 807, Taiwan
[4] Kaohsiung Med Univ, Coll Pharm, Grad Inst Nat Prod, Kaohsiung 807, Taiwan
关键词
improved binary particle swarm optimization; feature selection; gene expression data;
D O I
10.1016/j.compbiolchem.2007.09.005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:29 / 38
页数:10
相关论文
共 31 条
[1]   Regularized Least Squares cancer classifiers from DNA microarray data [J].
Ancona, N ;
Maglietta, R ;
D'Addabbo, A ;
Liuni, S ;
Pesole, G .
BMC BIOINFORMATICS, 2005, 6 (Suppl 4)
[2]  
[Anonymous], 2001, SWARM INTELL-US
[3]  
[Anonymous], 1991, NEAREST NEIGHB NORMS
[4]  
[Anonymous], [No title captured]
[5]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[6]   Instance-based concept learning from multiclass DNA microarray data [J].
Berrar, D ;
Bradbury, I ;
Dubitzky, W .
BMC BIOINFORMATICS, 2006, 7 (1)
[7]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]  
CRAMMER K, 2000, P 13 ANN C COMP LEAR
[9]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]  
Fix E., 1951, TECHNICAL REPORT REP, P261