An improved algorithm for clustering gene expression data

被引:125
作者
Bandyopadhyay, Sanghamitra
Mukhopadhyay, Anirban [1 ]
Maulik, Ujjwal
机构
[1] Univ Kalyani, Dept Comp Sci & Engn, Kalyani 741235, W Bengal, India
[2] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, India
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, India
关键词
D O I
10.1093/bioinformatics/btm418
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. Results: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
引用
收藏
页码:2859 / 2865
页数:7
相关论文
共 26 条
  • [1] FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes
    Al-Shahrour, F
    Díaz-Uriarte, R
    Dopazo, J
    [J]. BIOINFORMATICS, 2004, 20 (04) : 578 - 580
  • [2] [Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] Multiobjective genetic clustering for pixel classification in remote sensing imagery
    Bandyopadhyay, Sanghamitra
    Maulik, Ujjwal
    Mukhopadhyay, Anirban
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2007, 45 (05): : 1506 - 1511
  • [5] The transcriptional program of sporulation in budding yeast
    Chu, S
    DeRisi, J
    Eisen, M
    Mulholland, J
    Botstein, D
    Brown, PO
    Herskowitz, I
    [J]. SCIENCE, 1998, 282 (5389) : 699 - 705
  • [6] A fast and elitist multiobjective genetic algorithm: NSGA-II
    Deb, K
    Pratap, A
    Agarwal, S
    Meyarivan, T
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) : 182 - 197
  • [7] Fuzzy C-means method for clustering microarray data
    Dembélé, D
    Kastner, P
    [J]. BIOINFORMATICS, 2003, 19 (08) : 973 - 980
  • [8] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [9] A new convergence proof of fuzzy c-means
    Gröll, L
    Jäkel, J
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2005, 13 (05) : 717 - 720
  • [10] Hollander M., 1999, Nonparametric Statistical Methods