Data mining for gene expression profiles from DNA, microarray

被引:14
作者
Cho, SB [1 ]
Won, HH [1 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea
关键词
biological data mining; feature selection; classification; gene expression profile; MLP; KNN; SVM; SASOM; ensemble classifier; CLASSIFICATION; CANCER; PREDICTION;
D O I
10.1142/S0218194003001469
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microarray technology has supplied a large volume of data, which changes many problems in biology into the problems of computing. As a result techniques for extracting useful information from the data are developed. In particular, microarray technology has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer because the genes extracted from microarray have many noises. In this paper, we attempt to explore seven feature selection methods and four classifiers and propose ensemble classifiers in three benchmark datasets to systematically evaluate the performances of the feature selection methods and machine learning classifiers. Three benchmark datasets axe leukemia cancer dataset, colon cancer dataset and lymphoma cancer data set. The methods to combine the classifiers are majority voting, weighted voting, and Bayesian approach to improve the performance of classification. Experimental results show that the ensemble with several basis classifiers produces the best recognition rate on the benchmark datasets.
引用
收藏
页码:593 / 608
页数:16
相关论文
共 16 条
[1]   Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[2]   Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features [J].
Cho, SB ;
Ryu, JW .
PROCEEDINGS OF THE IEEE, 2002, 90 (11) :1744-1753
[3]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[4]  
DUDOIT S, 2000, 576 U CAL DEP STAT
[5]  
Eisen MB, 1999, METHOD ENZYMOL, V303, P179
[6]   Using Bayesian networks to analyze expression data [J].
Friedman, N ;
Linial, M ;
Nachman, I ;
Pe'er, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :601-620
[7]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[8]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[9]   Monitoring gene expression using DNA microarrays [J].
Harrington, CA ;
Rosenow, C ;
Retief, J .
CURRENT OPINION IN MICROBIOLOGY, 2000, 3 (03) :285-291
[10]   An algorithm for clustering cDNA fingerprints [J].
Hartuv, E ;
Schmitt, AO ;
Lange, J ;
Meier-Ewert, S ;
Lehrach, H ;
Shamir, R .
GENOMICS, 2000, 66 (03) :249-256