BioOptimizer: a Bayesian scoring function approach to motif discovery

被引:68
作者
Jensen, ST [1 ]
Liu, JS [1 ]
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
D O I
10.1093/bioinformatics/bth127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. Results: We derive a comprehensive scoring function based on a full Bayesian model that can handle unknown site abundance, unknown motif width and two-block motifs with variable-length gaps. An algorithm called BioOptimizer is proposed to optimize this scoring function so as to reduce noise in the motif signal found by any motif-finding program. The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria. In addition, this scoring function formulation enables us to compare objectively different predicted motifs and select the optimal ones, effectively combining the strengths of existing programs.
引用
收藏
页码:1557 / 1564
页数:8
相关论文
共 22 条
[1]  
[Anonymous], 1730, METHODUS DIFFERENTIA
[2]  
Bailey T., 1994, P 2 INT C INT SYST M, P28
[3]   Genome-wide analysis of the stationary-phase sigma factor (sigma-H) regulon of Bacillus subtilis [J].
Britton, RA ;
Eichenberger, P ;
Gonzalez-Pastor, JE ;
Fawcett, P ;
Monson, R ;
Losick, R ;
Grossman, AD .
JOURNAL OF BACTERIOLOGY, 2002, 184 (17) :4881-4890
[4]   Regulatory element detection using correlation with expression [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
NATURE GENETICS, 2001, 27 (02) :167-171
[5]   Integrating regulatory motif discovery and genome-wide expression analysis [J].
Conlon, EM ;
Liu, XS ;
Lieb, JD ;
Liu, JS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3339-3344
[6]   The σΕ regulon and the identification of additional sporulation genes in Bacillus subtilis [J].
Eichenberger, P ;
Jensen, ST ;
Conlon, EM ;
van Ooij, C ;
Silvaggi, J ;
González-Pastor, JE ;
Fujita, M ;
Ben-Yehuda, S ;
Stragier, P ;
Liu, JS ;
Losick, R .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 327 (05) :945-972
[7]  
HELMANN JD, 2002, BACILLUS SUBTILIS IT, pCH21
[8]   Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[9]  
JENSEN ST, 2004, IN PRESS STAT SCI
[10]   Identification of regulatory elements using a feature selection method [J].
Keles, S ;
van der Laan, M ;
Eisen, MB .
BIOINFORMATICS, 2002, 18 (09) :1167-1175