Multi-class clustering and prediction in the analysis of microarray data

被引:21
作者
Tsai, CA
Lee, TC
Ho, IC
Yang, UC
Chen, CH
Chen, JJ [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Div Biometry & Risk Assessment, HFT 20, Jefferson, AR 72079 USA
[2] Natl Yang Ming Univ, Inst Biopharmaceut Sci, Taipei 112, Taiwan
[3] Acad Sinica, Inst Biomed Sci, Taipei 115, Taiwan
[4] Natl Yang Ming Univ, Inst Biochem, Taipei 112, Taiwan
[5] Acad Sinica, Inst Stat Sci, Taipei 115, Taiwan
关键词
bagged clustering; bagging fuzzy clustering; gene selection; k-nn classification; rand statistic; shaded similarity matrix plot;
D O I
10.1016/j.mbs.2004.07.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set Omega(1) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set Omega(1) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set Omega(1) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:79 / 100
页数:22
相关论文
共 24 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[3]  
[Anonymous], 1993, Resampling-based multiple testing: Examples and methods for P-value adjustment
[4]   Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with colorimetry detection [J].
Chen, JJW ;
Wu, R ;
Yang, PC ;
Huang, JY ;
Sher, YP ;
Han, MH ;
Kao, WC ;
Lee, PJ ;
Chiu, TF ;
Chang, F ;
Chu, YW ;
Wu, CW ;
Peck, K .
GENOMICS, 1998, 51 (03) :313-324
[5]  
Cox T. F., 1994, MULTIDIMENSIONAL SCA
[6]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[7]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[10]   UNCLASSED MATRIX SHADING AND OPTIMAL ORDERING IN HIERARCHICAL CLUSTER-ANALYSIS [J].
GALE, N ;
HALPERIN, WC ;
COSTANZO, CM .
JOURNAL OF CLASSIFICATION, 1984, 1 (01) :75-92