Gene selection for cancer classification using wrapper approaches

被引:41
作者
Blanco, R [1 ]
Larrañaga, P [1 ]
Inza, I [1 ]
Sierra, B [1 ]
机构
[1] Univ Basque Country, Comp Sci & Artificial Intelligence Dept, San Sebastian 20080, Spain
关键词
feature subset selection; DNA microarrays; supervised classification; naive-Bayes; estimation of distribution algorithms;
D O I
10.1142/S0218001404003800
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the fact that cancer classification has considerably improved, nowadays a general method that classifies known types of cancer has not yet been developed. In this work, we propose the use of supervised classification techniques, coupled with feature subset selection algorithms, to automatically perform this classification in gene expression datasets. Due to the large number of features of gene expression datasets, the search of a highly accurate combination of features is done by means of the new Estimation of Distribution Algorithms paradigm. In order to assess the accuracy level of the proposed approach, the naive-Bayes classification algorithm is employed in a wrapper form. Promising results are achieved, in addition to a considerable reduction in the number of genes. Stating the optimal selection of genes as a search task, an automatic and robust choice in the genes finally selected is performed, in contrast to previous works that research the same types of problems.
引用
收藏
页码:1373 / 1390
页数:18
相关论文
共 30 条
[1]  
AHA DW, 1994, P AAAI 94 WORKSH CAS, P106
[2]  
ALMUALLIM H, 1991, PROCEEDINGS : NINTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, P547
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]  
[Anonymous], P UAI
[5]  
[Anonymous], [No title captured], DOI DOI 10.1016/B978-1-55860-332-5.50055-9
[6]  
BEIBEL M, 2000, P 1 INT S MED DAT AN, P300
[7]   Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[8]  
Bo TH, 2002, GENOME BIOL, V3
[9]   Gene expression data analysis [J].
Brazma, A ;
Vilo, J .
FEBS LETTERS, 2000, 480 (01) :17-24
[10]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130