A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series

被引:37
作者
Madeira, Sara C. [1 ,2 ,3 ]
Oliveira, Arlindo L. [1 ,2 ]
机构
[1] INESC ID, Knowledge Discovery & Bioinformat KDBIO Grp, Lisbon, Portugal
[2] Univ Tecn Lisboa, Inst Super Tecn, P-1096 Lisbon, Portugal
[3] Univ Beira Interior, Covilha, Portugal
来源
ALGORITHMS FOR MOLECULAR BIOLOGY | 2009年 / 4卷
关键词
CLUSTERS;
D O I
10.1186/1748-7188-4-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters. Methods: In this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters. Results: We present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series. Discussion: The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms.
引用
收藏
页数:39
相关论文
共 33 条
  • [1] Analysis of time-series gene expression data: Methods, challenges, and opportunities
    Androulakis, I. P.
    Yang, E.
    Almon, R. R.
    [J]. ANNUAL REVIEW OF BIOMEDICAL ENGINEERING, 2007, 9 : 205 - 228
  • [2] [Anonymous], 1997, ACM SIGACT NEWS
  • [3] Analyzing time series gene expression data
    Bar-Joseph, Z
    [J]. BIOINFORMATICS, 2004, 20 (16) : 2493 - 2503
  • [4] BicAT: a biclustering analysis toolbox
    Barkow, S
    Bleuler, S
    Prelic, A
    Zimmermann, P
    Zitzler, E
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1282 - 1283
  • [5] Discovering local structure in gene expression data: The order-preserving submatrix problem
    Ben-Dor, A
    Chor, B
    Karp, R
    Yakhini, Z
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) : 373 - 384
  • [6] Cheng Y., 2000, Proceedings International Conference on Intelligent System,s for Molecular Biology
  • [7] ISMB. International Conference on Intelligent System, V8, P93
  • [8] Genomic expression programs in the response of yeast cells to environmental changes
    Gasch, AP
    Spellman, PT
    Kao, CM
    Carmel-Harel, O
    Eisen, MB
    Storz, G
    Botstein, D
    Brown, PO
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (12) : 4241 - 4257
  • [9] Identifying time-lagged gene clusters using gene expression data
    Ji, LP
    Tan, KL
    [J]. BIOINFORMATICS, 2005, 21 (04) : 509 - 516
  • [10] Mining gene expression data for positive and negative co-regulated gene clusters
    Ji, LP
    Tan, KL
    [J]. BIOINFORMATICS, 2004, 20 (16) : 2711 - 2718