Clustering gene expression patterns

被引:701
作者
Ben-Dor, A [1 ]
Shamir, R
Yakhini, Z
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98105 USA
[2] Tel Aviv Univ, Sackler Fac Exact Sci, Dept Comp Sci, IL-69978 Tel Aviv, Israel
[3] Hewlett Packard Labs, Haifa, Israel
关键词
clustering algorithms; gene expression analysis; DNA arrays; probabilistic analysis; tissue classification;
D O I
10.1089/106652799318274
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns, In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the duster structure with high probability, The running time of the algorithm on an n-gene dataset is O{n(2)[log(n)](c)}. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.
引用
收藏
页码:281 / 297
页数:17
相关论文
共 35 条
[1]  
Alon N, 1998, RANDOM STRUCT ALGOR, V13, P457, DOI 10.1002/(SICI)1098-2418(199810/12)13:3/4<457::AID-RSA14>3.0.CO
[2]  
2-W
[3]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]  
[Anonymous], [No title captured]
[5]  
BENDOR A, 1999, P 3 INT C COMP MOL B
[6]   Sequence to array: Probing the genome's secrets [J].
Blanchard, AP ;
Hood, L .
NATURE BIOTECHNOLOGY, 1996, 14 (13) :1649-1649
[7]  
CONDON A, 1998, COMMUNICATION
[8]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[9]  
Dembo A., 2010, LARGE DEVIATIONS TEC
[10]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686