A hierarchical unsupervised growing neural network for clustering gene expression patterns

被引:404
作者
Herrero, J
Valencia, A
Dopazo, J
机构
[1] CNIO, Madrid 28220, Spain
[2] CSIC, CNB, Prot Design Grp, E-28049 Madrid, Spain
关键词
D O I
10.1093/bioinformatics/17.2.126
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mel. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used.
引用
收藏
页码:126 / 136
页数:11
相关论文
共 30 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[3]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[4]   DNA microarrays in drug discovery and development [J].
Debouck, C ;
Goodfellow, PN .
NATURE GENETICS, 1999, 21 (Suppl 1) :48-50
[5]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[6]   Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree [J].
Dopazo, J ;
Carazo, JM .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 44 (02) :226-233
[7]   STATISTICAL-DATA ANALYSIS IN THE COMPUTER-AGE [J].
EFRON, B ;
TIBSHIRANI, R .
SCIENCE, 1991, 253 (5018) :390-395
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]   GROWING CELL STRUCTURES - A SELF-ORGANIZING NETWORK FOR UNSUPERVISED AND SUPERVISED LEARNING [J].
FRITZKE, B .
NEURAL NETWORKS, 1994, 7 (09) :1441-1460
[10]   Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors [J].
Gray, NS ;
Wodicka, L ;
Thunnissen, AMWH ;
Norman, TC ;
Kwon, SJ ;
Espinoza, FH ;
Morgan, DO ;
Barnes, G ;
LeClerc, S ;
Meijer, L ;
Kim, SH ;
Lockhart, DJ ;
Schultz, PG .
SCIENCE, 1998, 281 (5376) :533-538