Diametrical clustering for identifying anti-correlated gene clusters

被引:71
作者
Dhillon, IS
Marcotte, EM
Roshan, U [1 ]
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas, Dept Chem & Biochem, Inst Cellular & Mol Med, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btg209
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. Results: We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i) re-partitioning the genes and (ii) computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
引用
收藏
页码:1612 / 1619
页数:8
相关论文
共 25 条
[1]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[2]   k-plane clustering [J].
Bradley, PS ;
Mangasarian, OL .
JOURNAL OF GLOBAL OPTIMIZATION, 2000, 16 (01) :23-32
[3]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[4]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[5]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[6]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[7]  
Golub GH, 1996, J HOPKINS STUDIES MA, V3rd
[8]  
Graybill F.A., 1994, REGRESSION ANAL CONC
[9]  
Hastie T., 2000, Genome Biology, V1, pr
[10]   A hierarchical unsupervised growing neural network for clustering gene expression patterns [J].
Herrero, J ;
Valencia, A ;
Dopazo, J .
BIOINFORMATICS, 2001, 17 (02) :126-136