Iterative signature algorithm for the analysis of large-scale gene expression data

被引:279
作者
Bergmann, S [1 ]
Ihmels, J [1 ]
Barkai, N [1 ]
机构
[1] Weizmann Inst Sci, Dept Mol Genet, IL-76100 Rehovot, Israel
来源
PHYSICAL REVIEW E | 2003年 / 67卷 / 03期
关键词
D O I
10.1103/PhysRevE.67.031902
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
We present an approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, which searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of singular value decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast Saccharomyces cerevisiae.
引用
收藏
页数:18
相关论文
共 29 条
[11]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[12]   Coupled two-way clustering analysis of gene microarray data [J].
Getz, G ;
Levine, E ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12079-12084
[13]  
Hart, 2006, PATTERN CLASSIFICATI
[14]  
Hastie T, 2001, GENOME BIOL, V2
[15]   Fundamental patterns underlying gene expression profiles: Simplicity from complexity [J].
Holter, NS ;
Mitra, M ;
Maritan, A ;
Cieplak, M ;
Banavar, JR ;
Fedoroff, NV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8409-8414
[16]   Revealing modular organization in the yeast transcriptional network [J].
Ihmels, J ;
Friedlander, G ;
Bergmann, S ;
Sarig, O ;
Ziv, Y ;
Barkai, N .
NATURE GENETICS, 2002, 31 (04) :370-377
[17]   A semidiscrete matrix decomposition for latent semantic indexing in information retrieval [J].
Kolda, TG ;
O'Leary, DP .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1998, 16 (04) :322-346
[18]   Array of hope [J].
Lander, ES .
NATURE GENETICS, 1999, 21 (Suppl 1) :3-4
[19]  
Lazzeroni L, 2002, STAT SINICA, V12, P61
[20]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791