Cancer class prediction: Two stage clustering approach to identify informative genes

被引:6
作者
Alshalalfah, Mohammed [1 ]
Alhajj, Reda [1 ,2 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
[2] Global Univ, Dept Comp Sci, Beirut, Lebanon
关键词
Clustering; classification; microarray; validity analysis; support vector machines; fuzziness parameter; Fuzzy C-means; MICROARRAY DATA; CLASSIFICATION; SELECTION; PROFILE;
D O I
10.3233/IDA-2009-0386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cancer classification is an important research area that has attracted the attention of several research groups over the last decades. However, there has been no general agreed upon approach for assigning tumors to known classes (a.k.a. class prediction). One challenge in microarray analysis, especially in cancerous gene expression profiles, is to identify genes or group of genes that are highly expressed in tumor cells but not in normal cells and vice versa. All of the methods described in the literature deal with features obtained directly from the data. Further, several clustering techniques have been proposed for the analysis of genome expression data, such as k-means, Self organizing maps, etc. However, these methods do not provide information about the influence of a given gene on the overall shape of the clusters. In this paper, we try to generate informative data, which can be more powerful in the classification of genes. We identify a set of reduced features capable of distinguishing between two classes by two stage clustering of genes using fuzzy c-means. In the first stage, the proposed clustering method clusters the original data. In the second stage, it clusters genes in each of the clusters produced from the first stage. We decided on using fuzzy c-means because a fuzzy model fits better gene expression data analysis by having a gene belong to different classes with a degree of membership per class. However, fuzziness parameter m is a major problem in applying fuzzy c-means for clustering. In this approach, we try to better identify the value of the fuzziness parameter when applying fuzzy c-means for microarray data. Support vector machine combined with different kernel functions are used for classification. The results from the experiments conducted on three benchmark data sets (including one multi-class data set) demonstrate the applicability and effectiveness of the proposed approach as compared to the other approaches described in the literature.
引用
收藏
页码:671 / 686
页数:16
相关论文
共 32 条
  • [1] Microarray data analysis: from disarray to consolidation and consensus
    Allison, DB
    Cui, XQ
    Page, GP
    Sabripour, M
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (01) : 55 - 65
  • [2] ALSHALALFA M, 2007, P IEEE INT S BIOINF
  • [3] [Anonymous], FUZZY C MEANS CLUSTE
  • [4] MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
    Armstrong, SA
    Staunton, JE
    Silverman, LB
    Pieters, R
    de Boer, ML
    Minden, MD
    Sallan, SE
    Lander, ES
    Golub, TR
    Korsmeyer, SJ
    [J]. NATURE GENETICS, 2002, 30 (01) : 41 - 47
  • [5] Pattern identification and classification in gene expression data using an autoassociative neural network model
    Bicciato, S
    Pandin, M
    Didonè, G
    Di Bello, C
    [J]. BIOTECHNOLOGY AND BIOENGINEERING, 2003, 81 (05) : 594 - 606
  • [6] Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED)
    Bijlani, R
    Cheng, YH
    Pearce, DA
    Brooks, AI
    Ogihara, M
    [J]. BIOINFORMATICS, 2003, 19 (01) : 62 - 70
  • [7] Cluster validation techniques for genome expression data
    Bolshakova, N
    Azuaje, F
    [J]. SIGNAL PROCESSING, 2003, 83 (04) : 825 - 833
  • [8] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [9] Longitudinal MicroPET Imaging of brain tumor growth with F-18-labeled RGD peptide
    Chen, XY
    Park, R
    Khankaldyyan, V
    Gonzales-Gomez, I
    Tohme, M
    Moats, RA
    Bading, JR
    Laug, WE
    Conti, PS
    [J]. MOLECULAR IMAGING AND BIOLOGY, 2006, 8 (01) : 9 - 15
  • [10] Fuzzy C-means method for clustering microarray data
    Dembélé, D
    Kastner, P
    [J]. BIOINFORMATICS, 2003, 19 (08) : 973 - 980