MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences

被引:58
作者
Arnold, Phil [1 ]
Erb, Ionas [2 ,3 ]
Pachkov, Mikhail [1 ]
Molina, Nacho [4 ]
van Nimwegen, Erik [1 ]
机构
[1] Univ Basel, Biozentrum, Swiss Inst Bioinformat, CH-4056 Basel, Switzerland
[2] Ctr Genom Regulat CRG, Bioinformat & Genom Program, Barcelona, Spain
[3] Pompeu Fabra Univ UPF, Barcelona, Spain
[4] Ecole Polytech Fed Lausanne, Sch Life Sci, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
FACTOR-BINDING-SITES; TRANSCRIPTION; IDENTIFICATION; PHYLOSCAN; ELEMENTS;
D O I
10.1093/bioinformatics/btr695
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis-regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. Results: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction.
引用
收藏
页码:487 / 494
页数:8
相关论文
共 37 条
[1]   Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? [J].
Arnosti, DN ;
Kulkarni, MM .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2005, 94 (05) :890-898
[2]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[3]  
Bulyk ML, 2004, GENOME BIOL, V5
[4]   PhyloScan: identification of transcription factor binding sites using cross-species evidence [J].
Carmack, C. Steven ;
McCue, Lee Ann ;
Newberg, Lee A. ;
Lawrence, Charles E. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2007, 2
[5]   Correlating Gene Expression Variation with cis-Regulatory Polymorphism in Saccharomyces cerevisiae [J].
Chen, Kevin ;
van Nimwegen, Erik ;
Rajewsky, Nikolaus ;
Siegal, Mark L. .
GENOME BIOLOGY AND EVOLUTION, 2010, 2 :697-707
[6]  
Davidson E.H., 2001, GENOMIC REGULATORY S
[7]  
Durbin R., 1998, Analysis, V356, DOI [10.1017/CBO9780511790492, DOI 10.1017/CBO9780511790492]
[8]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[9]   Detection of cis-element clusters in higher eukaryotic DNA [J].
Frith, MC ;
Hansen, U ;
Weng, ZP .
BIOINFORMATICS, 2001, 17 (10) :878-889
[10]   Evolutionary distances for protein-coding sequences: Modeling site-specific residue frequencies [J].
Halpern, AL ;
Bruno, WJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1998, 15 (07) :910-917