A systematic comparison and evaluation of biclustering methods for gene expression data

被引:584
作者
Prelic, A
Bleuler, S [1 ]
Zimmermann, P
Wille, A
Bühlmann, P
Gruissem, W
Hennig, L
Thiele, L
Zitzler, E
机构
[1] ETH, Comp Engn & Networks Lab, CH-8092 Zurich, Switzerland
[2] ETH, Inst Plant Sci, CH-8092 Zurich, Switzerland
[3] ETH, Funct Genom Ctr Zurich, CH-8092 Zurich, Switzerland
[4] ETH, Colab, CH-8092 Zurich, Switzerland
[5] ETH, Seminar Stat, CH-8092 Zurich, Switzerland
关键词
D O I
10.1093/bioinformatics/btl060
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.
引用
收藏
页码:1122 / 1129
页数:8
相关论文
共 35 条
  • [1] ALEXE G, 2002, TFDIMACS200252
  • [2] [Anonymous], 1993, Resampling-based multiple testing: Examples and methods for P-value adjustment
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] A cluster validity framework for genome expression data
    Azuaje, F
    [J]. BIOINFORMATICS, 2002, 18 (02) : 319 - 320
  • [5] Ben-Dor A., 2002, P 6 ANN INT C COMP B, P49, DOI DOI 10.1145/565196.565203
  • [6] Iterative signature algorithm for the analysis of large-scale gene expression data
    Bergmann, S
    Ihmels, J
    Barkai, N
    [J]. PHYSICAL REVIEW E, 2003, 67 (03): : 18
  • [7] Characterizing gene sets with FuncAssociate
    Berriz, GF
    King, OD
    Bryant, B
    Sander, C
    Roth, FP
    [J]. BIOINFORMATICS, 2003, 19 (18) : 2502 - 2504
  • [8] CHENG Y, 2000, P 8 INT C INT SYST M, P93
  • [9] Comparisons and validation of statistical clustering techniques for microarray gene expression data
    Datta, S
    Datta, S
    [J]. BIOINFORMATICS, 2003, 19 (04) : 459 - 466
  • [10] Genomic expression programs in the response of yeast cells to environmental changes
    Gasch, AP
    Spellman, PT
    Kao, CM
    Carmel-Harel, O
    Eisen, MB
    Storz, G
    Botstein, D
    Brown, PO
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (12) : 4241 - 4257