BEAM: A beam search algorithm for the identification of Cis-regulatory elements in groups of genes

被引:16
作者
Carlson, Jonathan M. [1 ]
Chakravarty, Arijit [1 ]
Gross, Robert H. [1 ]
机构
[1] Dartmouth Coll, Dept Biol, Gilman Labs 104, Hanover, NH 03755 USA
关键词
motif finder; bounded search; transcription factor binding sites; coregulated genes; promoter motifs; gene regulation;
D O I
10.1089/cmb.2006.13.686
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification of potential protein binding sites (cis-regulatory elements) in the upstream regions of genes is key to understanding the mechanisms that regulate gene expression. To this end, we present a simple, efficient algorithm, BEAM (beam-search enumerative algorithm for motif finding), aimed at the discovery of cis-regulatory elements in the DNA sequences upstream of a. related group of genes. This algorithm dramatically limits the search space of expanded sequences, converting the problem from one that is exponential in the length of motifs sought to one that is linear. Unlike sampling algorithms, our algorithm converges and is capable of finding statistically overrepresented motifs with a low failure rate. Further, our algorithm is not dependent on the objective function or the organism used. Limiting the space of candidate motifs enables the algorithm to focus only on those motifs that are most likely to be biologically relevant and enables the algorithm to use direct evaluations of background frequencies instead of resorting to probabilistic estimates. In addition, limiting the space of candidate motifs makes it possible to use computationally expensive objective functions that are able to correctly identify biologically relevant motifs.
引用
收藏
页码:686 / 701
页数:16
相关论文
共 34 条
[1]  
[Anonymous], ISMB
[2]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[3]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[4]  
Blanchette M, 2001, Bioinformatics, V17 Suppl 1, pS30
[5]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[6]   Predicting gene regulatory elements in silico on a genomic scale [J].
Brazma, A ;
Jonassen, I ;
Vilo, J ;
Ukkonen, E .
GENOME RESEARCH, 1998, 8 (11) :1202-1215
[7]   THE INDIVIDUAL ERGODIC THEOREM OF INFORMATION-THEORY [J].
BREIMAN, L .
ANNALS OF MATHEMATICAL STATISTICS, 1957, 28 (03) :809-811
[8]   The effects of selection against spurious transcription factor binding sites [J].
Hahn, MW ;
Stajich, JE ;
Wray, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (06) :901-906
[9]   Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[10]  
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81