Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods

被引:30
作者
Alonso-Gonzalez, Carlos J. [1 ]
Isaac Moro-Sancho, Q. [1 ]
Simon-Hurtado, Arancha [1 ]
Varela-Arrabal, Ricardo [1 ]
机构
[1] Univ Valladolid, Escuela Tecn Super Ingn Informat, Dept Informat, Intelligent Syst Grp GSI, E-47011 Valladolid, Spain
关键词
Microarray data classification; Feature selection; Machine learning; Efficient classification with few genes; CELL LUNG-CANCER; STATISTICAL COMPARISONS; PREDICT SURVIVAL; TUMOR; PATTERNS; LEUKEMIA; IDENTIFICATION; CLASSIFIERS; DISCOVERY; PROFILES;
D O I
10.1016/j.eswa.2012.01.096
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Microarray data classification is a task involving high dimensionality and small samples sizes. A common criterion to decide on the number of selected genes is maximizing the accuracy, which risks overfitting and usually selects more genes than actually needed. We propose, relaxing the maximum accuracy criterion, to select the combination of attribute selection and classification algorithm that using less attributes has an accuracy not statistically significantly worst that the best. Also we give some advice to choose a suitable combination of attribute selection and classifying algorithms for a good accuracy when using a low number of gene expressions. We used some well known attribute selection methods (FCBF, ReliefF and SVM-RFE, plus a Random selection, used as a base line technique) and classifying techniques (Naive Bayes, 3 Nearest Neighbor and SVM with linear kernel) applied to 30 data sets involving different cancer types. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7270 / 7280
页数:11
相关论文
共 60 条
[1]
Aguilar-Ruiz J. S., 2011, DATASET REPOSITORY A
[2]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]
Alonso-Gonzalez C. J., 2010, LNAI
[5]
[Anonymous], P 9 INT WORKSH MACH
[6]
[Anonymous], 1997, P 14 INT C MACH LEAR
[7]
[Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
[8]
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[9]
Badea L, 2008, HEPATO-GASTROENTEROL, V55, P2016
[10]
Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824