Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm

被引:62
作者
Johansson, Oe. [2 ]
Alkema, W. [3 ]
Wasserman, W. W. [1 ]
Lagergren, J. [4 ,5 ]
机构
[1] Univ British Columbia, Ctr Mol Med & Therapeut, Dept Med Genet, Vancouver, BC V5Z 4H4, Canada
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[3] Karolinska Inst, Ctr Genom & Bioinformat, SE-17177 Stockholm, Sweden
[4] KTH, Dept Numer Anal & Comp Sci, SE-10044 Stockholm, Sweden
[5] KTH, Stockholm Bioinformat Ctr, SE-10044 Stockholm, Sweden
关键词
transcription; gene networks; modules; motif; promoter;
D O I
10.1093/bioinformatics/btg1021
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The identification of regulatory control regions within genomes is a major challenge. Studies have demonstrated that regulating regions can be described as locally dense clusters or modules of cis-acting transcription factor binding sites (TFBS). For well-described biological contexts, it is possible to train predictive algorithms to discern novel modules in genome sequences. However, utility of module detection methods has been severely limited by insufficient training data. For only a few tissues can one obtain sufficient numbers of literature-derived regulatory modules. Results: We present a novel method, MSCAN, that circumvents the training data problem by measuring the statistical significance of any non-overlapping combination of TFBS in a window. Given a set of transcription factor binding profiles, a significance threshold, and a genomic sequence, MSCAN returns putative regulatory regions. We assess performance on two curated collections of regulatory regions; one each for tissue-specific expression in liver and skeletal muscle cells. The efficiency of MSCAN allows for predictive screens of entire genomes.
引用
收藏
页码:i169 / i176
页数:8
相关论文
共 22 条
[1]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[2]   Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome [J].
Berman, BP ;
Nibu, Y ;
Pfeiffer, BD ;
Tomancak, P ;
Celniker, SE ;
Levine, M ;
Rubin, GM ;
Eisen, MB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) :757-762
[3]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[4]  
Claverie JM, 1996, COMPUT APPL BIOSCI, V12, P431
[5]  
Davidson E. H., 2001, Genomic regulatory systems: development and evolution
[6]   Searching for regulatory elements in human noncoding sequences [J].
Duret, L ;
Bucher, P .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) :399-406
[7]  
Fickett JW, 1996, GENE, V172, pGC19, DOI 10.1016/0378-1119(95)00888-8
[8]   A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter [J].
Frech, K ;
DanescuMayer, J ;
Werner, T .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 270 (05) :674-687
[9]   Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences [J].
Frith, MC ;
Spouge, JL ;
Hansen, U ;
Weng, ZP .
NUCLEIC ACIDS RESEARCH, 2002, 30 (14) :3214-3224
[10]   Detection of cis-element clusters in higher eukaryotic DNA [J].
Frith, MC ;
Hansen, U ;
Weng, ZP .
BIOINFORMATICS, 2001, 17 (10) :878-889