Computational discovery of gene regulatory binding motifs: A Bayesian perspective

被引:56
作者
Jensen, ST
Liu, XS
Zhou, Q
Liu, JS
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[3] Dana Farber Canc Inst, Boston, MA 02115 USA
关键词
gene regulation; motif discovery; Bayesian models; scoring functions; optimization; Markov chain Monte Carlo;
D O I
10.1214/088342304000000107
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The Bayesian approach together with Markov chain Monte Carlo techniques has provided an attractive solution to many important bioinformatics problems such as multiple sequence alignment, microarray analysis and the discovery of gene regulatory binding motifs. The employment of such methods and, more broadly, explicit statistical modeling, has revolutionized the field of computational biology. After reviewing several heuristics-based computational methods, this article presents a systematic account of Bayesian formulations and solutions to the motif discovery problem. Generalizations are made to further enhance the Bayesian approach. Motivated by the need of a speedy algorithm, we also provide a perspective of the problem from the viewpoint of optimizing a scoring function. We observe that scoring functions resulting from proper posterior distributions, or approximations to such distributions, showed the best performance and can be used to improve upon existing motif-finding programs. Simulation analyses and a real-data example are used to support our observation.
引用
收藏
页码:188 / 204
页数:17
相关论文
共 45 条
[1]  
[Anonymous], 1730, METHODUS DIFFERENTIA
[2]  
Bailey T., 1994, P 2 INT C INT SYST M, P28
[3]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[4]   Probabilistic code for DNA recognition by proteins of the EGR family [J].
Benos, PV ;
Lapedes, AS ;
Stormo, GD .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) :701-727
[5]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :17-20
[6]   Predicting gene regulatory elements in silico on a genomic scale [J].
Brazma, A ;
Jonassen, I ;
Vilo, J ;
Ukkonen, E .
GENOME RESEARCH, 1998, 8 (11) :1202-1215
[7]   ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments [J].
Buck, MJ ;
Lieb, JD .
GENOMICS, 2004, 83 (03) :349-360
[8]   Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10096-10100
[9]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[10]   Integrating regulatory motif discovery and genome-wide expression analysis [J].
Conlon, EM ;
Liu, XS ;
Lieb, JD ;
Liu, JS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3339-3344