A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length

被引:73
作者
Favorov, AV
Gelfand, MS
Gerasimova, AV
Ravcheev, DA
Mironov, AA
Makeev, VJ
机构
[1] State Sci Ctr GosNIIGenet, Lab Bioinformat, Moscow 117545, Russia
[2] Russian Acad Sci, Inst Informat Transmiss Problems, Moscow 127994, Russia
[3] Moscow MV Lomonosov State Univ, Dept Bioengn & Bioinformat, Moscow 119992, Russia
[4] Russian Acad Sci, VA Engelhardt Mol Biol Inst, Moscow 119991, Russia
基金
俄罗斯基础研究基金会;
关键词
D O I
10.1093/bioinformatics/bti336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription regulatory protein factors often bind DNA as homo-dimers or hetero-dimers. Thus they recognize structured DNA motifs that are inverted or direct repeats or spaced motif pairs. However, these motifs are often difficult to identify owing to their high divergence. The motif structure included explicitly into the motif recognition algorithm improves recognition efficiency for highly divergent motifs as well as estimation of motif geometric parameters. Result: We present a modification of the Gibbs sampling motif extraction algorithm, SeSiMCMC (Sequence Similarities by Markov Chain Monte Carlo), which finds structured motifs of these types, as well as non-structured motifs, in a set of unaligned DNA sequences. It employs improved estimators of motif and spacer lengths. The probability that a sequence does not contain any motif is accounted for in a rigorous Bayesian manner. We have applied the algorithm to a set of upstream regions of genes from two Escherichia coli regulons involved in respiration. We have demonstrated that accounting for a symmetric motif structure allows the algorithm to identify weak motifs more accurately. In the examples studied, ArcA binding sites were demonstrated to have the structure of a direct spaced repeat, whereas NarP binding sites exhibited the palindromic structure.
引用
收藏
页码:2240 / 2245
页数:6
相关论文
共 41 条
[1]  
[Anonymous], DISCRETIZATION MCMC
[2]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[3]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[4]   BAYESIAN COMPUTATION AND STOCHASTIC-SYSTEMS [J].
BESAG, J ;
GREEN, P ;
HIGDON, D ;
MENGERSEN, K .
STATISTICAL SCIENCE, 1995, 10 (01) :3-41
[5]  
Bulyk ML, 2004, GENOME BIOL, V5
[6]   Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts [J].
Chiang, DY ;
Moses, AM ;
Kellis, M ;
Lander, ES ;
Eisen, MB .
GENOME BIOLOGY, 2003, 4 (07)
[7]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190
[8]   Differential regulation by the homologous response regulators NarL and NarP of Escherichia coli K-12 depends on DNA binding site arrangement [J].
Darwin, AJ ;
Tyson, KL ;
Busby, SJW ;
Stewart, V .
MOLECULAR MICROBIOLOGY, 1997, 25 (03) :583-595
[9]   EXPRESSION OF THE NARX, NARL, NARP, AND NARQ GENES OF ESCHERICHIA-COLI K-12 - REGULATION OF THE REGULATORS [J].
DARWIN, AJ ;
STEWART, V .
JOURNAL OF BACTERIOLOGY, 1995, 177 (13) :3865-3869
[10]  
FAVOROV AV, 2002, P BGRS 2002 NOV, V1, P31