BayesPeak: Bayesian analysis of ChIP-seq data

被引:85
作者
Spyrou, Christiana [1 ,3 ]
Stark, Rory [3 ]
Lynch, Andy G. [4 ]
Tavare, Simon [2 ,4 ]
机构
[1] Ctr Math Sci, Stat Lab, Cambridge, England
[2] Ctr Math Sci, DAMTP, Cambridge, England
[3] Canc Res UK, Cambridge Res Inst, Li Ka Shing Ctr, Cambridge, England
[4] Univ Cambridge, Dept Oncol, Li Ka Shing Ctr, Cambridge, England
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
英国工程与自然科学研究理事会;
关键词
HIDDEN MARKOV MODEL; GENOME-WIDE ANALYSIS; BINDING-SITES; TRANSCRIPTION;
D O I
10.1186/1471-2105-10-299
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription factor binding sites and histone modifications. Methods: Our proposed statistical algorithm, BayesPeak, uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Moreover, a control sample can be incorporated in the analysis to account for experimental and sequence biases. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are used to detect the sites of interest. Conclusion: We have presented a flexible approach for identifying peaks from ChIP-seq reads, suitable for use on both transcription factor binding and histone modification data. Our method estimates probabilities of enrichment that can be used in downstream analysis. The method is assessed using experimentally verified data and is shown to provide high-confidence calls with low false positive rates.
引用
收藏
页数:17
相关论文
共 36 条
[1]  
[Anonymous], 2003, Bayesian data analysis
[2]  
[Anonymous], 2008, GENOME BIOL
[3]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[4]   General methods for monitoring convergence of iterative simulations [J].
Brooks, SP ;
Gelman, A .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1998, 7 (04) :434-455
[5]   Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. ;
Ghosh, Debashis ;
Qin, Zhaohui S. .
BIOINFORMATICS, 2009, 25 (14) :1715-1721
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge [J].
Du, Jiang ;
Rozowsky, Joel S. ;
Korbel, Jan O. ;
Zhang, Zhengdong D. ;
Royce, Thomas E. ;
Schultz, Martin H. ;
Snyder, Michael ;
Gerstein, Mark .
BIOINFORMATICS, 2006, 22 (24) :3016-3024
[8]   FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology [J].
Fejes, Anthony P. ;
Robertson, Gordon ;
Bilenky, Mikhail ;
Varhol, Richard ;
Bainbridge, Matthew ;
Jones, Steven J. M. .
BIOINFORMATICS, 2008, 24 (15) :1729-1730
[9]   Detection of functional DNA motifs via statistical over-representation [J].
Frith, MC ;
Fu, YT ;
Yu, LQ ;
Chen, JF ;
Hansen, U ;
Weng, ZP .
NUCLEIC ACIDS RESEARCH, 2004, 32 (04) :1372-1381
[10]  
GELMAN A, 1995, MARKOV CHAIN MONTE C