Use of SVD-based probit transformation in clustering gene expression profiles

被引：14

作者：

Liang, Faming ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA

来源：

COMPUTATIONAL STATISTICS & DATA ANALYSIS | 2007年 / 51卷 / 12期

基金：

美国国家科学基金会;

关键词：

gene expression profiles; model-based clustering; probit transformation; singular value decomposition;

D O I：

10.1016/j.csda.2007.01.022

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The mixture-Gaussian model-based clustering method has received much attention in clustering gene expression profiles in the literature of bioinformatics. However, this method suffers from two difficulties in applications. The first one is on the parameter estimation, which becomes difficult when the dimension of the data is high or the size of a cluster is small. The second one is on the normality assumption for gene expression levels, which is seldom satisfied by real data. In this paper, we propose to overcome these two difficulties by the probit transformation in conjunction with the singular value decomposition (SVD). SVD reduces the dimensionality of the data, and the probit transformation converts the scaled eigensamples, which can be interpreted as correlation coefficients as explained in the text, into Gaussian random variables. Our numerical results show that the SVD-based probit transformation enhances the ability of the mixture-Gaussian model-based clustering method for identifying prominent patterns of the data. As a by-product, we show that the SVD-based probit transformation also improves the performance of the model-free clustering methods, such as hierarchical, K-means and self-organizing maps (SOM), for the data sets containing scattered genes. In this paper, we also propose a run test-based rule for selection of eigensamples used for clustering. (C) 2007 Elsevier B.V. All rights reserved.

引用

页码：6355 / 6366

页数：12

共 30 条

[1] An improvement of the NEC criterion for assessing the number of clusters in a mixture model
Biernacki, C
Celeux, G
Govaert, G
[J]. PATTERN RECOGNITION LETTERS, 1999, 20 (03) : 267 - 272
[2] Carr D.B., 1997, STAT COMPUTING GRAPH, V8, P20
[3] CHANG WC, 1983, J ROY STAT SOC C, V32, P267
[4] Chen GX, 2002, STAT SINICA, V12, P241
[5] A genome-wide transcriptional analysis of the mitotic cell cycle
Cho, RJ
Campbell, MJ
Winzeler, EA
Steinmetz, L
Conway, A
Wodicka, L
Wolfsberg, TG
Gabrielian, AE
Landsman, D
Lockhart, DJ
Davis, RW
[J]. MOLECULAR CELL, 1998, 2 (01) : 65 - 73
[6] CLUSTER SEPARATION MEASURE
DAVIES, DL
BOULDIN, DW
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 224 - 227
[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[8] Cluster analysis and display of genome-wide expression patterns
Eisen, MB
Spellman, PT
Brown, PO
Botstein, D
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
[9] Model-based clustering, discriminant analysis, and density estimation
Fraley, C
Raftery, AE
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
[10] Hastie T., 2000, Genome Biology, V1, pr

← 1 2 3 →