Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences

被引:88
作者
Frith, MC
Spouge, JL
Hansen, U
Weng, ZP
机构
[1] Boston Univ, Bioinformat Program, Boston, MA 02215 USA
[2] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
[3] Boston Univ, Dept Biol, Boston, MA 02215 USA
[4] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
关键词
D O I
10.1093/nar/gkf438
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The human genome encodes the transcriptional control of its genes in clusters of cis-elements that constitute enhancers, silencers and promoter signals. The sequence motifs of individual cis-elements are usually too short and degenerate for confident detection. In most cases, the requirements for organization of cis-elements within these clusters are poorly understood. Therefore, we have developed a general method to detect local concentrations of cis-element motifs, using predetermined matrix representations of the cis-elements, and calculate the statistical significance of these motif clusters. The statistical significance calculation is highly accurate not only for idealized, pseudo-random DNA, but also for real human DNA. We use our method 'cluster of motifs E-value tool' (COMET) to make novel predictions concerning the regulation of genes by transcription factors associated with muscle. COMET performs comparably with two alternative state-of-the-art techniques, which are more complex and lack E-value calculations. Our statistical method enables us to clarify the major bottleneck in the hard problem of detecting cis-regulatory regions, which is that many known enhancers do not contain very significant clusters of the motif types that we search for. Thus, discovery of additional signals that belong to these regulatory regions will be the key to future progress.
引用
收藏
页码:3214 / 3224
页数:11
相关论文
共 73 条
  • [1] Nuclear hormone receptors and gene expression
    Aranda, A
    Pascual, A
    [J]. PHYSIOLOGICAL REVIEWS, 2001, 81 (03) : 1269 - 1304
  • [2] Arnone MI, 1997, DEVELOPMENT, V124, P1851
  • [3] GenBank
    Benson, DA
    Karsch-Mizrachi, I
    Lipman, DJ
    Ostell, J
    Rapp, BA
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 15 - 18
  • [4] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [5] Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome
    Berman, BP
    Nibu, Y
    Pfeiffer, BD
    Tomancak, P
    Celniker, SE
    Levine, M
    Rubin, GM
    Eisen, MB
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) : 757 - 762
  • [6] Sp1 and kruppel-like factor family of transcription factors in cell growth regulation and cancer
    Black, AR
    Black, JD
    Azizkhan-Clifford, J
    [J]. JOURNAL OF CELLULAR PHYSIOLOGY, 2001, 188 (02) : 143 - 160
  • [7] Genomic sequence comparison of the human and mouse adenosine deaminase gene regions
    Brickner, AG
    Koop, BF
    Aronow, BJ
    Wiginton, DA
    [J]. MAMMALIAN GENOME, 1999, 10 (02) : 95 - 101
  • [8] WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES
    BUCHER, P
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) : 563 - 578
  • [9] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [10] Adenylate kinase phosphotransfer communicates cellular energetic signals to ATP-sensitive potassium channels
    Carrasco, AJ
    Dzeja, PP
    Alekseev, AE
    Pucar, D
    Zingman, LV
    Abraham, MR
    Hodgson, D
    Bienengraeber, M
    Puceat, M
    Janssen, E
    Wieringa, B
    Terzic, A
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (13) : 7623 - 7628