Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method

被引:433
作者
Li, LP [1 ]
Weinberg, CR
Darden, TA
Pedersen, LG
机构
[1] NIEHS, Biostat Branch, Res Triangle Pk, NC 27709 USA
[2] NIEHS, Lab Struct Biol, Res Triangle Pk, NC 27709 USA
[3] Univ N Carolina, Dept Chem, Chapel Hill, NC 27599 USA
关键词
D O I
10.1093/bioinformatics/17.12.1131
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm. Methods: Our approach combines a Genetic Algorithm (GA) and the k-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples. Results: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification.
引用
收藏
页码:1131 / 1142
页数:12
相关论文
共 16 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[4]  
BENDOR A, 2000, P 4 INT C COMP MOL B
[5]   Data analysis and integration: of steps and arrows [J].
Bittner, M ;
Meltzer, P ;
Trent, J .
NATURE GENETICS, 1999, 22 (03) :213-215
[6]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[7]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[8]   Coupled two-way clustering analysis of gene microarray data [J].
Getz, G ;
Levine, E ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12079-12084
[9]   Super-paramagnetic clustering of yeast gene expression profiles [J].
Getz, G ;
Levine, E ;
Domany, E ;
Zhang, MQ .
PHYSICA A, 2000, 279 (1-4) :457-464
[10]  
GOLDBERG DE, 1989, GENETIC ALORITHMS SE