Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

被引:90
作者
Chen, Kun-Huang [1 ]
Wang, Kung-Jeng [1 ]
Tsai, Min-Lung [2 ]
Wang, Kung-Min [3 ]
Adrian, Angelia Melani [1 ]
Cheng, Wei-Chung [4 ,5 ]
Yang, Tzu-Sen [6 ,7 ]
Teng, Nai-Chia [8 ]
Tan, Kuo-Pin [9 ]
Chang, Ku-Shang [2 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Ind Management, Taipei 106, Taiwan
[2] Yuanpei Univ, Dept Food Sci, Hsinchu 300, Taiwan
[3] Shin Kong Wu Ho Mem Hosp, Dept Surg, Taipei, Taiwan
[4] Cheng Hsin Gen Hosp, Dept Surg, Taipei 11220, Taiwan
[5] Natl Yang Ming Univ, Genom Res Ctr, Taipei 11221, Taiwan
[6] Taipei Med Univ, Sch Dent Technol, Taipei 110, Taiwan
[7] Taipei Med Univ, Taiwan Res Ctr Biomed Implants & Microsurg Dev, Taipei 110, Taiwan
[8] Taipei Med Univ, Coll Oral Med, Sch Dent, Taipei, Taiwan
[9] Natl Taiwan Univ Sci & Technol, Sch Management, MBA, Taipei 106, Taiwan
关键词
Gene expression; Cancer; Particle swarm optimization; Decision tree classifier; SUPPORT VECTOR MACHINE; DISCRIMINATION METHODS; NEAREST NEIGHBOR; EXPRESSION DATA; CLASSIFICATION; PREDICTION; INFERENCE; HYBRID;
D O I
10.1186/1471-2105-15-49
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. Results: To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Conclusion: Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.
引用
收藏
页数:10
相关论文
共 43 条
[1]
A feature selection technique for classificatory analysis [J].
Ahmad, A ;
Dey, L .
PATTERN RECOGNITION LETTERS, 2005, 26 (01) :43-56
[2]
Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms [J].
Alba, Enrique ;
Garcia-Nieto, Jose ;
Jourdan, Laetitia ;
Talbi, El-Ghazali .
2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, :284-+
[3]
[Anonymous], 2012, GEMS DATASET
[4]
microPred: effective classification of pre-miRNAs for human miRNA gene prediction [J].
Batuwita, Rukshan ;
Palade, Vasile .
BIOINFORMATICS, 2009, 25 (08) :989-995
[5]
Gene expression data analysis [J].
Brazma, A ;
Vilo, J .
FEBS LETTERS, 2000, 480 (01) :17-24
[6]
Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]
Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis [J].
Chen, Li-Fei ;
Su, Chao-Ton ;
Chen, Kun-Huang ;
Wang, Pa-Chun .
NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08) :2087-2096
[8]
Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database [J].
Cheng, Wei-Chung ;
Tsai, Min-Lung ;
Chang, Cheng-Wei ;
Huang, Ching-Lung ;
Chen, Chaang-Ray ;
Shu, Wun-Yi ;
Lee, Yun-Shien ;
Wang, Tzu-Hao ;
Hong, Ji-Hong ;
Li, Chia-Yang ;
Hsu, Ian C. .
BMC BIOINFORMATICS, 2010, 11
[9]
SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[10]
Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87