An efficient algorithm for the identification of structured motifs in DNA promoter sequences

被引:40
作者
Carvalho, AM
Freitas, AT
Oliveira, AL
Sagot, MF
机构
[1] IST, INESC, ID, P-1000029 Lisbon, Portugal
[2] Univ Lyon 1, INRIA Rhone Alpes, F-69622 Villeurbanne, France
关键词
box-factor tree; structured motif; promoter; binding site consensus;
D O I
10.1109/TCBB.2006.16
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi.
引用
收藏
页码:126 / 140
页数:15
相关论文
共 33 条
[1]  
ALLALI J, 2004, THESIS U MARNE LA VA
[2]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[3]  
BAILEY TL, 1995, P 3 INT C INT SYST M, P21
[4]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[5]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[6]  
CARVALHO AM, 2005, P 3 AS PAC BIOINF C, P273
[7]  
CROCHEMORE M, IN PRESS HDB COMPUTA
[8]   Searching for regulatory elements in human noncoding sequences [J].
Duret, L ;
Bucher, P .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) :399-406
[9]  
Eskin E., 2002, BIOINFORMATICS, V18, P354, DOI DOI 10.1093/BIOINFORMATICS/18.SUPPL_1.S354
[10]  
Eskin Eleazar, 2003, Pac Symp Biocomput, P29