Using a genetic algorithm and a perceptron for feature selection and supervised class learning in DNA microarray data

被引:20
作者
Karzynski, M [1 ]
Mateos, A [1 ]
Herrero, J [1 ]
Dopazo, J [1 ]
机构
[1] Ctr Nacl Invest Oncol, Bioinformat Unit, Madrid 28029, Spain
关键词
clustering; dimensionality reduction; feature selection; gene expression; genetic algorithm; perceptron; SOTA; weights; GROWING NEURAL-NETWORK; CLUSTERING ANALYSIS; EXPRESSION DATA; CLASSIFICATION; PREDICTION; CANCER;
D O I
10.1023/A:1026032530166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class prediction and feature selection is key in the context of diagnostic applications of DNA microarrays. Microarray data is noisy and typically composed of a low number of samples and a large number of genes. Perceptrons can constitute an efficient tool for accurate classification of microarray data. Nevertheless, the large input layers necessary for the direct application of perceptrons and the low samples available for the training process hamper its use. Two strategies can be taken for an optimal use of a perceptron with a favourable balance between samples for training and the size of the input layer: (a) reducing the dimensionality of the data set from thousands to no more than one hundred, highly informative average values, and using the weights of the perceptron for feature selection or (b) using a selection of only few genes that produce an optimal classification with the perceptron. In this case, feature selection is carried out first. Obviously, a combined approach is also possible. In this manuscript we explore and compare both alternatives. We study the informative contents of the data at different levels of compression with a very efficient clustering algorithm (Self Organizing Tree Algorithm). We show how a simple genetic algorithm selects a subset of gene expression values with 100% accuracy in the classification of samples with maximum efficiency. Finally, the importance of dimensionality reduction is discussed in light of its capacity for reducing noise and redundancies in microarray data.
引用
收藏
页码:39 / 51
页数:13
相关论文
共 19 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], 2004, LECT NOTES COMPUT SC
[4]  
BROWN PO, 1999, NAT BIOTECHNOL, V14, P1675
[5]  
CALIFANO A, 2000, ISMB, V8, P75
[6]  
CUMMINGS CA, 2001, BRIEFINGS BIOINFORMA, V2, P402
[7]   Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree [J].
Dopazo, J ;
Carazo, JM .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 44 (02) :226-233
[8]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[9]   Coupled two-way clustering analysis of gene microarray data [J].
Getz, G ;
Levine, E ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12079-12084
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537