Evaluation and optimization of clustering in gene expression data analysis

被引:28
作者
Famili, AF [1 ]
Liu, GM [1 ]
Liu, ZY [1 ]
机构
[1] Natl Res Council Canada, Inst Informat Technol, Ottawa, ON K1A 0R6, Canada
关键词
D O I
10.1093/bioinformatics/bth124
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. Results: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters.
引用
收藏
页码:1535 / 1545
页数:11
相关论文
共 22 条
  • [1] A cluster validity framework for genome expression data
    Azuaje, F
    [J]. BIOINFORMATICS, 2002, 18 (02) : 319 - 320
  • [2] Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
  • [3] A genome-wide transcriptional analysis of the mitotic cell cycle
    Cho, RJ
    Campbell, MJ
    Winzeler, EA
    Steinmetz, L
    Conway, A
    Wodicka, L
    Wolfsberg, TG
    Gabrielian, AE
    Landsman, D
    Lockhart, DJ
    Davis, RW
    [J]. MOLECULAR CELL, 1998, 2 (01) : 65 - 73
  • [4] Comparisons and validation of statistical clustering techniques for microarray gene expression data
    Datta, S
    Datta, S
    [J]. BIOINFORMATICS, 2003, 19 (04) : 459 - 466
  • [5] Dudoit S, 2002, GENOME BIOL, V3
  • [6] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [7] FAMILI A, 2003, P 21 IASTED INT MULT, P32
  • [8] FAMILI A, 2003, UNPUB 17 INT C IND E
  • [9] FISKE D, 1983, CLUSTER ANAL SOCIAL, P104
  • [10] Stability-based cluster analysis applied to microarray data
    Giurcaneanu, CD
    Tabus, I
    Shmulevich, I
    Zhang, W
    [J]. SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 57 - 60