An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data

被引:53
作者
Hsu, AL [1 ]
Tang, SL
Halgamuge, SK
机构
[1] Univ Melbourne, Dept Mech & Mfg Engn, Mechatron Res Grp, Parkville, Vic 3010, Australia
[2] Univ Melbourne, Sch Vet Sci, Parkville, Vic 3010, Australia
关键词
D O I
10.1093/bioinformatics/btg296
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. Results: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal).
引用
收藏
页码:2131 / 2140
页数:10
相关论文
共 40 条
[1]   Dynamic self-organizing maps with controlled growth for knowledge discovery [J].
Alahakoon, D ;
Halgamuge, SK ;
Srinivasan, B .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03) :601-614
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]  
AMIRGHOFRAN Z, 1999, IRN J MED SCI, V24, P40
[5]  
[Anonymous], NIPS 2001 WORKSH MAC
[6]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[7]  
BENDOR A, 2000, P 4 ANN INT C COMP M
[8]  
Blackmore J., 1995, P 12 INT C MACH LEAR
[9]  
DUDOIT S, 2000, COMP DISCRIMINATION
[10]   GROWING CELL STRUCTURES - A SELF-ORGANIZING NETWORK FOR UNSUPERVISED AND SUPERVISED LEARNING [J].
FRITZKE, B .
NEURAL NETWORKS, 1994, 7 (09) :1441-1460