A survey of DNA motif finding algorithms

被引:287
作者
Das, Modan K. [1 ,2 ]
Dai, Ho-Kwok [1 ]
机构
[1] Oklahoma State Univ, Dept Comp Sci, Stillwater, OK 74078 USA
[2] Univ Arizona, USDA ARS, Dept Plant Sci, Tucson, AZ 85721 USA
关键词
Transcription Factor Binding Site; Motif Finding; Motif Model; Orthologous Sequence; Suffix Tree;
D O I
10.1186/1471-2105-8-S7-S21
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.
引用
收藏
页数:13
相关论文
共 92 条
[11]  
Bussemaker H J, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P67
[12]   PhyloScan: identification of transcription factor binding sites using cross-species evidence [J].
Carmack, C. Steven ;
McCue, Lee Ann ;
Newberg, Lee A. ;
Lawrence, Charles E. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2007, 2
[13]   Finding functional features in Saccharomyces genomes by phylogenetic footprinting [J].
Cliften, P ;
Sudarsanam, P ;
Desikan, A ;
Fulton, L ;
Fulton, B ;
Majors, J ;
Waterston, R ;
Cohen, BA ;
Johnston, M .
SCIENCE, 2003, 301 (5629) :71-76
[14]   Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis [J].
Cliften, PF ;
Hillier, LW ;
Fulton, L ;
Graves, T ;
Miner, T ;
Gish, WR ;
Waterston, RH ;
Johnston, M .
GENOME RESEARCH, 2001, 11 (07) :1175-1186
[15]   NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence [J].
Down, TA ;
Hubbard, TJP .
NUCLEIC ACIDS RESEARCH, 2005, 33 (05) :1445-1453
[16]  
Eskin Eleazar, 2002, Bioinformatics, V18 Suppl 1, pS354
[17]  
Favorov AV, 2004, P 4 INT C BIOINF GEN
[18]   Discovery of sequence motifs related to coexpression of genes using evolutionary computation [J].
Fogel, GB ;
Weekes, DG ;
Varga, G ;
Dow, ER ;
Harlow, HB ;
Onyia, JE ;
Su, C .
NUCLEIC ACIDS RESEARCH, 2004, 32 (13) :3826-3835
[19]   Finding functional sequence elements by multiple local alignment [J].
Frith, MC ;
Hansen, U ;
Spouge, JL ;
Weng, ZP .
NUCLEIC ACIDS RESEARCH, 2004, 32 (01) :189-200
[20]   RIGOROUS PATTERN-RECOGNITION METHODS FOR DNA-SEQUENCES - ANALYSIS OF PROMOTER SEQUENCES FROM ESCHERICHIA-COLI [J].
GALAS, DJ ;
EGGERT, M ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1985, 186 (01) :117-128