A general approach for discriminative de novo motif discovery from high-throughput data

被引:32
作者
Grau, Jan [1 ]
Posch, Stefan [1 ]
Grosse, Ivo [1 ]
Keilwagen, Jens [2 ,3 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, D-06099 Halle, Saale, Germany
[2] Fed Res Ctr Cultivated Plants, Julius Kuhn Inst, Inst Biosafety Plant Biotechnol, D-06484 Quedlinburg, Germany
[3] Leibniz Inst Plant Genet & Crop Plant Res IPK, Dept Mol Genet, D-06466 Seeland Ot Gatersleben, Germany
关键词
PROTEIN-DNA INTERACTIONS; CHIP-SEQ DATA; FACTOR-BINDING SITES; TRANSCRIPTION FACTOR; POSITIONAL INFORMATION; GENOME; SPECIFICITY; RESOLUTION; SEQUENCES; NETWORK;
D O I
10.1093/nar/gkt831
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
引用
收藏
页数:11
相关论文
共 47 条
[1]   Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR [J].
Ao, W ;
Gaudet, J ;
Kent, WJ ;
Muttumu, S ;
Mango, SE .
SCIENCE, 2004, 305 (5691) :1743-1746
[2]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[3]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[4]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[5]   Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors [J].
Berger, Michael F. ;
Bulyk, Martha L. .
NATURE PROTOCOLS, 2009, 4 (03) :393-411
[6]   Binding Site Turnover Produces Pervasive Quantitative Changes in Transcription Factor Binding between Closely Related Drosophila Species [J].
Bradley, Robert K. ;
Li, Xiao-Yong ;
Trapnell, Cole ;
Davidson, Stuart ;
Pachter, Lior ;
Chu, Hou Cheng ;
Tonkin, Leath A. ;
Biggin, Mark D. ;
Eisen, Michael B. .
PLOS BIOLOGY, 2010, 8 (03)
[7]  
BUNTINE W, 1991, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P52
[8]  
Cerquides J, 2005, LECT NOTES ARTIF INT, V3720, P72, DOI 10.1007/11564096_12
[9]   hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data [J].
Chen, Li ;
Wu, George ;
Ji, Hongkai .
BIOINFORMATICS, 2011, 27 (10) :1447-1448
[10]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117