msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

被引:21
作者
Raj, Anil [1 ]
Shim, Heejung [2 ]
Gilad, Yoav [2 ]
Pritchard, Jonathan K. [1 ,3 ,4 ]
Stephens, Matthew [2 ,5 ]
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Univ Chicago, Dept Human Genet, Chicago, IL 60637 USA
[3] Stanford Univ, Dept Biol, Stanford, CA 94305 USA
[4] Howard Hughes Med Inst, Chevy Chase, MD USA
[5] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
来源
PLOS ONE | 2015年 / 10卷 / 09期
关键词
OPEN CHROMATIN;
D O I
10.1371/journal.pone.0138030
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
引用
收藏
页数:15
相关论文
共 25 条
[1]  
[Anonymous], 2003, VARIATIONAL ALGORITH
[2]   High-resolution mapping and characterization of open chromatin across the genome [J].
Boyle, Alan P. ;
Davis, Sean ;
Shulha, Hennady P. ;
Meltzer, Paul ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Furey, Terrence S. ;
Crawford, Gregory E. .
CELL, 2008, 132 (02) :311-322
[3]   High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells [J].
Boyle, Alan P. ;
Song, Lingyun ;
Lee, Bum-Kyu ;
London, Darin ;
Keefe, Damian ;
Birney, Ewan ;
Iyer, Vishwanath R. ;
Crawford, Gregory E. ;
Furey, Terrence S. .
GENOME RESEARCH, 2011, 21 (03) :456-464
[4]  
Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/nmeth.2688, 10.1038/NMETH.2688]
[5]   Adapting to unknown smoothness via wavelet shrinkage [J].
Donoho, DL ;
Johnstone, IM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (432) :1200-1224
[6]   High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints [J].
Guo, Yuchun ;
Mahony, Shaun ;
Gifford, David K. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (08)
[7]  
He HH, 2014, NAT METHODS, V11, P73, DOI [10.1038/NEMTH.2762, 10.1038/nmeth.2762]
[8]  
Hesselberth JR, 2009, NAT METHODS, V6, P283, DOI [10.1038/NMETH.1313, 10.1038/nmeth.1313]
[9]   DNA-Binding Specificities of Human Transcription Factors [J].
Jolma, Arttu ;
Yan, Jian ;
Whitington, Thomas ;
Toivonen, Jarkko ;
Nitta, Kazuhiro R. ;
Rastas, Pasi ;
Morgunova, Ekaterina ;
Enge, Martin ;
Taipale, Mikko ;
Wei, Gonghong ;
Palin, Kimmo ;
Vaquerizas, Juan M. ;
Vincentelli, Renaud ;
Luscombe, Nicholas M. ;
Hughes, Timothy R. ;
Lemaire, Patrick ;
Ukkonen, Esko ;
Kivioja, Teemu ;
Taipale, Jussi .
CELL, 2013, 152 (1-2) :327-339
[10]   Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities [J].
Jolma, Arttu ;
Kivioja, Teemu ;
Toivonen, Jarkko ;
Cheng, Lu ;
Wei, Gonghong ;
Enge, Martin ;
Taipale, Mikko ;
Vaquerizas, Juan M. ;
Yan, Jian ;
Sillanpaa, Mikko J. ;
Bonke, Martin ;
Palin, Kimmo ;
Talukder, Shaheynoor ;
Hughes, Timothy R. ;
Luscombe, Nicholas M. ;
Ukkonen, Esko ;
Taipale, Jussi .
GENOME RESEARCH, 2010, 20 (06) :861-873