Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

被引:148
作者
Bailey, Timothy [1 ]
Krajewski, Pawel [2 ]
Ladunga, Istvan [3 ]
Lefebvre, Celine [4 ]
Li, Qunhua [5 ]
Liu, Tao [6 ]
Madrigal, Pedro [2 ]
Taslim, Cenny [7 ]
Zhang, Jie [7 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
[2] Polish Acad Sci, Dept Biometry & Bioinformat, Inst Plant Genet, Poznan, Poland
[3] Univ Nebraska, Dept Stat, Beadle Ctr, Lincoln, NE USA
[4] Canc Inst Gustave Roussy, INSERM, U981, Villejuif, France
[5] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[6] SUNY Buffalo, Dept Biochem, Buffalo, NY 14214 USA
[7] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; GENOME-WIDE IDENTIFICATION; DNA BINDING-SITES; R-PACKAGE; BIOCONDUCTOR PACKAGE; NORMALIZATION; ALGORITHM; ALIGNMENT; SOFTWARE; PLATFORM;
D O I
10.1371/journal.pcbi.1003326
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
引用
收藏
页数:8
相关论文
共 99 条
[81]  
Taslim C, 2012, METHODS MOL BIOL, V802, P275, DOI 10.1007/978-1-61779-400-1_18
[82]   DIME: R-package for identifying differential ChIP-seq based on an ensemble of mixture models [J].
Taslim, Cenny ;
Huang, Tim ;
Lin, Shili .
BIOINFORMATICS, 2011, 27 (11) :1569-1570
[83]   Comparative study on ChIP-seq data: normalization and binding pattern characterization [J].
Taslim, Cenny ;
Wu, Jiejun ;
Yan, Pearlly ;
Singer, Greg ;
Parvin, Jeffrey ;
Huang, Tim ;
Lin, Shili ;
Huang, Kun .
BIOINFORMATICS, 2009, 25 (18) :2334-2340
[84]   RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets [J].
Thomas-Chollier, Morgane ;
Herrmann, Carl ;
Defrance, Matthieu ;
Sand, Olivier ;
Thieffry, Denis ;
van Helden, Jacques .
NUCLEIC ACIDS RESEARCH, 2012, 40 (04) :e31
[85]   The accessible chromatin landscape of the human genome [J].
Thurman, Robert E. ;
Rynes, Eric ;
Humbert, Richard ;
Vierstra, Jeff ;
Maurano, Matthew T. ;
Haugen, Eric ;
Sheffield, Nathan C. ;
Stergachis, Andrew B. ;
Wang, Hao ;
Vernot, Benjamin ;
Garg, Kavita ;
John, Sam ;
Sandstrom, Richard ;
Bates, Daniel ;
Boatman, Lisa ;
Canfield, Theresa K. ;
Diegel, Morgan ;
Dunn, Douglas ;
Ebersol, Abigail K. ;
Frum, Tristan ;
Giste, Erika ;
Johnson, Audra K. ;
Johnson, Ericka M. ;
Kutyavin, Tanya ;
Lajoie, Bryan ;
Lee, Bum-Kyu ;
Lee, Kristen ;
London, Darin ;
Lotakis, Dimitra ;
Neph, Shane ;
Neri, Fidencio ;
Nguyen, Eric D. ;
Qu, Hongzhu ;
Reynolds, Alex P. ;
Roach, Vaughn ;
Safi, Alexias ;
Sanchez, Minerva E. ;
Sanyal, Amartya ;
Shafer, Anthony ;
Simon, Jeremy M. ;
Song, Lingyun ;
Vong, Shinny ;
Weaver, Molly ;
Yan, Yongqi ;
Zhang, Zhancheng ;
Zhang, Zhuzhu ;
Lenhard, Boris ;
Tewari, Muneesh ;
Dorschner, Michael O. ;
Hansen, R. Scott .
NATURE, 2012, 489 (7414) :75-82
[86]  
van de Werken HJG, 2012, NAT METHODS, V9, P969, DOI [10.1038/NMETH.2173, 10.1038/nmeth.2173]
[87]   Regulatory Sequence Analysis Tools [J].
van Helden, J .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3593-3596
[88]   An effective approach for identification of in vivo protein-DNA binding sites from paired-end ChIP-Seq data [J].
Wang, Congmao ;
Xu, Jie ;
Zhang, Dasheng ;
Wilson, Zoe A. ;
Zhang, Dabing .
BMC BIOINFORMATICS, 2010, 11
[89]   Resource Elasticity of Offspring Survival and the Optimal Evolution of Sex Ratios [J].
Wang, Rui-Wu ;
Wang, Ya-Qiang ;
He, Jun-Zhou ;
Li, Yao-Tang .
PLOS ONE, 2013, 8 (01)
[90]   Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks [J].
White, Michael A. ;
Myers, Connie A. ;
Corbo, Joseph C. ;
Cohen, Barak A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (29) :11952-11957