SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps

被引:46
作者
Setty, Manu [1 ]
Leslie, Christina S. [1 ]
机构
[1] Mem Sloan Kettering Canc Ctr, Computat Biol Program, New York, NY 10021 USA
关键词
TRANSCRIPTION FACTORS; CHROMATIN; SEQUENCE; DATABASE; SPECIFICATION; ENHANCERS; DISCOVERY; SITES;
D O I
10.1371/journal.pcbi.1004271
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase-or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.
引用
收藏
页数:21
相关论文
共 49 条
[1]   High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions [J].
Agius, Phaedra ;
Arvey, Aaron ;
Chang, William ;
Noble, William Stafford ;
Leslie, Christina .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (09)
[2]  
Anders S, 2010, GENOME BIOL, V11, pR106, DOI DOI 10.1186/gb-2010-11-10-r106
[3]  
[Anonymous], HOMER SOFTWARE MOTIF
[4]  
[Anonymous], NAT METHODS
[5]   Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages [J].
Arinobu, Yojiro ;
Mizuno, Shin-ichi ;
Chong, Yong ;
Shigematsu, Hirokazu ;
Lino, Tadafumi ;
Iwasaki, Hiromi ;
Graf, Thomas ;
Mayfield, Robin ;
Chan, Susan ;
Kastner, Philippe ;
Akashi, Koichi .
CELL STEM CELL, 2007, 1 (04) :416-427
[6]   Sequence and chromatin determinants of cell-type-specific transcription factor binding [J].
Arvey, Aaron ;
Agius, Phaedra ;
Noble, William Stafford ;
Leslie, Christina .
GENOME RESEARCH, 2012, 22 (09) :1723-1734
[7]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[8]  
Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/NMETH.2688, 10.1038/nmeth.2688]
[9]   Enhancers: The abundance and function of regulatory sequences beyond promoters [J].
Bulger, Michael ;
Groudine, Mark .
DEVELOPMENTAL BIOLOGY, 2010, 339 (02) :250-257
[10]   OCT-2, ALTHOUGH NOT REQUIRED FOR EARLY B-CELL DEVELOPMENT, IS CRITICAL FOR LATER B-CELL MATURATION AND FOR POSTNATAL SURVIVAL [J].
CORCORAN, LM ;
KARVELAS, M ;
NOSSAL, GJV ;
YE, ZS ;
JACKS, T ;
BALTIMORE, D .
GENES & DEVELOPMENT, 1993, 7 (04) :570-582