Judging the quality of gene expression-based clustering methods using gene annotation

被引:220
作者
Gibbons, FD [1 ]
Roth, FP [1 ]
机构
[1] Harvard Univ, Sch Med, Dept Biol Chem & Mol Pharmacol, Boston, MA 02115 USA
关键词
D O I
10.1101/gr.397002
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.
引用
收藏
页码:1574 / 1581
页数:8
相关论文
共 41 条
  • [1] Systematic management and analysis of yeast gene expression data
    Aach, J
    Rindone, W
    Church, GM
    [J]. GENOME RESEARCH, 2000, 10 (04) : 431 - 445
  • [2] ANGELO M, 1999, GENECLUSTER
  • [3] [Anonymous], 1980, CLUSTER ANAL
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] BEAZLEY DM, 1998, PERL EXTENSION BUILD
  • [6] BEAZLEY DM, 2001, SWIG USERS MANUAL V
  • [7] Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
  • [8] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [9] A genome-wide transcriptional analysis of the mitotic cell cycle
    Cho, RJ
    Campbell, MJ
    Winzeler, EA
    Steinmetz, L
    Conway, A
    Wodicka, L
    Wolfsberg, TG
    Gabrielian, AE
    Landsman, D
    Lockhart, DJ
    Davis, RW
    [J]. MOLECULAR CELL, 1998, 2 (01) : 65 - 73
  • [10] Discrimination between paralogs using microarray analysis: Application to the Yap1p and Yap2p transcriptional networks
    Cohen, BA
    Pilpel, Y
    Mitra, RD
    Church, GM
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2002, 13 (05) : 1608 - 1614