Bipartite pattern discovery by entropy minimization-based multiple local alignment

被引:22
作者
Bi, CP
Rogan, PK
机构
[1] Childrens Mercy Hosp & Clin, Lab Human Mol Genet, Kansas City, MO 64108 USA
[2] Univ Missouri, Sch Med, Kansas City, MO 64110 USA
[3] Univ Missouri, Sch Comp Sci & Engn, Kansas City, MO 64110 USA
关键词
D O I
10.1093/nar/gkh825
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Many multimeric transcription factors recognize DNA sequence patterns by cooperatively binding to bipartite elements composed of half sites separated by a flexible spacer. We developed a novel bipartite algorithm, bipartite pattern discovery (Bipad), which produces a mathematical model based on information maximization or Shannon's entropy minimization principle, for discovery of bipartite sequence patterns. Bipad is a C++ program that applies greedy methods to search the bipartite alignment space and examines the upstream or downstream regions of co-regulated genes, looking for cis-regulatory bipartite patterns. An input sequence file with zero or one site per locus is required, and the left and right motif widths and a range of possible gap lengths must be specified. Bipad can run in either single-block or bipartite pattern search modes, and it is capable of comprehensively searching all four orientations of half-site patterns. Simulation studies showed that the accuracy of this motif discovery algorithm depends on sample size and motif conservation level, but results were independent of background composition. Bipad performed equivalent with or better than other pattern search algorithms in correctly identifying Escherichia coli cyclic AMP receptor protein and Bacillus subtilis sigma factor binding site sequences based on experimentally defined benchmarks. Finally, a new bipartite information weight matrix for vitamin D-3 receptor/retinoid X receptor alpha (VDR/RXRalpha) binding sites was derived that comprehensively models the natural variability inherent in these sequence elements.
引用
收藏
页码:4979 / 4991
页数:13
相关论文
共 47 条
[1]  
[Anonymous], 2001, Introduction to Algorithms
[2]  
Bailey T., 1994, P 2 INT C INT SYST M, P28
[3]   MaskerAid:: a performance enhancement to RepeatMasker [J].
Bedell, JA ;
Korf, I ;
Gish, W .
BIOINFORMATICS, 2000, 16 (11) :1040-1041
[4]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[5]  
BI CP, 2004, P 8 ANN INT C RES CO, P453
[6]  
BROWN M, 1993, P 1 INT C INT SYST M, P47
[7]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[8]   IDENTIFICATION OF DNA-SEQUENCES THAT BIND RETINOID-X-RECEPTOR-1,25(OH)(2)D-3-RECEPTOR HETERODIMERS WITH HIGH-AFFINITY [J].
COLNOT, S ;
LAMBERT, M ;
BLIN, C ;
THOMASSET, M ;
PERRET, C .
MOLECULAR AND CELLULAR ENDOCRINOLOGY, 1995, 113 (01) :89-98
[9]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[10]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids