Use of SVD-based probit transformation in clustering gene expression profiles

被引:14
作者
Liang, Faming [1 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
gene expression profiles; model-based clustering; probit transformation; singular value decomposition;
D O I
10.1016/j.csda.2007.01.022
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The mixture-Gaussian model-based clustering method has received much attention in clustering gene expression profiles in the literature of bioinformatics. However, this method suffers from two difficulties in applications. The first one is on the parameter estimation, which becomes difficult when the dimension of the data is high or the size of a cluster is small. The second one is on the normality assumption for gene expression levels, which is seldom satisfied by real data. In this paper, we propose to overcome these two difficulties by the probit transformation in conjunction with the singular value decomposition (SVD). SVD reduces the dimensionality of the data, and the probit transformation converts the scaled eigensamples, which can be interpreted as correlation coefficients as explained in the text, into Gaussian random variables. Our numerical results show that the SVD-based probit transformation enhances the ability of the mixture-Gaussian model-based clustering method for identifying prominent patterns of the data. As a by-product, we show that the SVD-based probit transformation also improves the performance of the model-free clustering methods, such as hierarchical, K-means and self-organizing maps (SOM), for the data sets containing scattered genes. In this paper, we also propose a run test-based rule for selection of eigensamples used for clustering. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:6355 / 6366
页数:12
相关论文
共 30 条
  • [1] An improvement of the NEC criterion for assessing the number of clusters in a mixture model
    Biernacki, C
    Celeux, G
    Govaert, G
    [J]. PATTERN RECOGNITION LETTERS, 1999, 20 (03) : 267 - 272
  • [2] Carr D.B., 1997, STAT COMPUTING GRAPH, V8, P20
  • [3] CHANG WC, 1983, J ROY STAT SOC C, V32, P267
  • [4] Chen GX, 2002, STAT SINICA, V12, P241
  • [5] A genome-wide transcriptional analysis of the mitotic cell cycle
    Cho, RJ
    Campbell, MJ
    Winzeler, EA
    Steinmetz, L
    Conway, A
    Wodicka, L
    Wolfsberg, TG
    Gabrielian, AE
    Landsman, D
    Lockhart, DJ
    Davis, RW
    [J]. MOLECULAR CELL, 1998, 2 (01) : 65 - 73
  • [6] CLUSTER SEPARATION MEASURE
    DAVIES, DL
    BOULDIN, DW
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 224 - 227
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [9] Model-based clustering, discriminant analysis, and density estimation
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
  • [10] Hastie T., 2000, Genome Biology, V1, pr