On the detection and refinement of transcription factor binding sites using ChIP-Seq data

被引:76
作者
Hu, Ming [1 ,2 ]
Yu, Jindan [3 ,4 ,5 ,6 ]
Taylor, Jeremy M. G. [2 ,5 ]
Chinnaiyan, Arul M. [3 ,4 ,5 ,7 ,8 ]
Qin, Zhaohui S. [1 ,2 ]
机构
[1] Univ Michigan, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Michigan Ctr Translat Pathol, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Ctr Comprehens Canc, Ann Arbor, MI 48109 USA
[6] Northwestern Univ, Dept Med, Div Hematol Oncol, Chicago, IL 60660 USA
[7] Univ Michigan, Howard Hughes Med Inst, Ann Arbor, MI 48109 USA
[8] Univ Michigan, Dept Urol, Sch Med, Ann Arbor, MI 48109 USA
关键词
PROTEIN-DNA INTERACTIONS; HUMAN GENOME; CHROMATIN-IMMUNOPRECIPITATION; MOTIF DISCOVERY; SEQUENCES; MODEL; EXPRESSION; IDENTIFICATION; PATTERNS; GENE;
D O I
10.1093/nar/gkp1180
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein-DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better detect and refine transcription factor binding sites (TFBS). We introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for TFBS motif discovery in ChIP-Seq data. We propose a Bayesian model that incorporates sequencing depth information to aid motif identification. Our model also allows intra-motif dependency to describe more accurately the underlying motif pattern. Our algorithm combines stochastic sampling and deterministic 'greedy' search steps into a novel hybrid iterative scheme. This combination accelerates the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that (i) the accuracy of existing TFBS motif patterns can be significantly improved; and (ii) there is significant intra-motif dependency inside all the TFBS motifs we tested; modeling these dependencies further improves the accuracy of these TFBS motif patterns. These findings may offer new biological insights into the mechanisms of transcription factor regulation.
引用
收藏
页码:2154 / 2167
页数:14
相关论文
共 51 条
[1]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[2]  
Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
[3]  
Barash Y., 2003, RECOMB 2003
[4]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[5]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[6]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[7]   Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10096-10100
[8]   Regulatory element detection using correlation with expression [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
NATURE GENETICS, 2001, 27 (02) :167-171
[9]   Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. ;
Ghosh, Debashis ;
Qin, Zhaohui S. .
BIOINFORMATICS, 2009, 25 (14) :1715-1721
[10]   Integrating regulatory motif discovery and genome-wide expression analysis [J].
Conlon, EM ;
Liu, XS ;
Lieb, JD ;
Liu, JS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3339-3344