Using hexamers to predict cis-regulatory motifs in Drosophila

被引:34
作者
Chan, BY [1 ]
Kibler, D [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Irvine, CA 92717 USA
关键词
D O I
10.1186/1471-2105-6-262
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Cis-regulatory modules (CRMs) are short stretches of DNA that help regulate gene expression in higher eukaryotes. They have been found up to 1 megabase away from the genes they regulate and can be located upstream, downstream, and even within their target genes. Due to the difficulty of finding CRMs using biological and computational techniques, even well-studied regulatory systems may contain CRMs that have not yet been discovered. Results: We present a simple, efficient method (HexDiff) based only on hexamer frequencies of known CRMs and non-CRM sequence to predict novel CRMs in regulatory systems. On a data set of 16 gap and pair-rule genes containing 52 known CRMs, predictions made by HexDiff had a higher correlation with the known CRMs than several existing CRM prediction algorithms: Ahab, Cluster Buster, MSCAN, MCAST, and LWF. After combining the results of the different algorithms, 10 putative CRMs were identified and are strong candidates for future study. The hexamers used by HexDiff to distinguish between CRMs and non-CRM sequence were also analyzed and were shown to be enriched in regulatory elements. Conclusion: HexDiff provides an efficient and effective means for finding new CRMs based on known CRMs, rather than known binding sites.
引用
收藏
页数:9
相关论文
共 34 条
  • [1] Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome:: the fluffy-tail test -: art. no. 109
    Abnizova, I
    te Boekhorst, R
    Walter, K
    Gilks, WR
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [2] Searching for statistically significant regulatory modules
    Bailey, Timothy L.
    Noble, William Stafford
    [J]. BIOINFORMATICS, 2003, 19 : II16 - II25
  • [3] Assessing the accuracy of prediction algorithms for classification: an overview
    Baldi, P
    Brunak, S
    Chauvin, Y
    Andersen, CAF
    Nielsen, H
    [J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
  • [4] Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome
    Berman, BP
    Nibu, Y
    Pfeiffer, BD
    Tomancak, P
    Celniker, SE
    Levine, M
    Rubin, GM
    Eisen, MB
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) : 757 - 762
  • [5] Discovery of regulatory elements by a computational method for phylogenetic footprinting
    Blanchette, M
    Tompa, M
    [J]. GENOME RESEARCH, 2002, 12 (05) : 739 - 748
  • [6] Phylogenetic shadowing of primate sequences to find functional regions of the human genome
    Boffelli, D
    McAuliffe, J
    Ovcharenko, D
    Lewis, KD
    Ovcharenko, I
    Pachter, L
    Rubin, EM
    [J]. SCIENCE, 2003, 299 (5611) : 1391 - 1394
  • [7] DAVIDSON EH, 2003, P NATL ACAD SCI USA, V200, P2586
  • [8] Cluster-Buster: finding dense clusters of motifs in DNA sequences
    Frith, MC
    Li, MC
    Weng, ZP
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3666 - 3668
  • [9] Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura
    Grad, YH
    Roth, FP
    Halfon, MS
    Church, GM
    [J]. BIOINFORMATICS, 2004, 20 (16) : 2738 - 2750
  • [10] De novo cis-regulatory module elicitation for eukaryotic genomes
    Gupta, M
    Liu, JS
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (20) : 7079 - 7084