Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

被引:148
作者
Bailey, Timothy [1 ]
Krajewski, Pawel [2 ]
Ladunga, Istvan [3 ]
Lefebvre, Celine [4 ]
Li, Qunhua [5 ]
Liu, Tao [6 ]
Madrigal, Pedro [2 ]
Taslim, Cenny [7 ]
Zhang, Jie [7 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
[2] Polish Acad Sci, Dept Biometry & Bioinformat, Inst Plant Genet, Poznan, Poland
[3] Univ Nebraska, Dept Stat, Beadle Ctr, Lincoln, NE USA
[4] Canc Inst Gustave Roussy, INSERM, U981, Villejuif, France
[5] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[6] SUNY Buffalo, Dept Biochem, Buffalo, NY 14214 USA
[7] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; GENOME-WIDE IDENTIFICATION; DNA BINDING-SITES; R-PACKAGE; BIOCONDUCTOR PACKAGE; NORMALIZATION; ALGORITHM; ALIGNMENT; SOFTWARE; PLATFORM;
D O I
10.1371/journal.pcbi.1003326
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
引用
收藏
页数:8
相关论文
共 99 条
[71]   PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci [J].
Salmon-Divon, Mali ;
Dvinge, Heidi ;
Tammoja, Kairi ;
Bertone, Paul .
BMC BIOINFORMATICS, 2010, 11
[72]   A temporal map of transcription factor activity: Mef2 directly regulates at all stages of muscle target genes development [J].
Sandmann, Thomas ;
Jensen, Lars J. ;
Jakobsen, Janus S. ;
Karzynski, Michal M. ;
Eichenlaub, Michael P. ;
Bork, Peer ;
Furlong, Eileen E. M. .
DEVELOPMENTAL CELL, 2006, 10 (06) :797-807
[73]   Combining multiple ChIP-seq peak detection systems using combinatorial fusion [J].
Schweikert, Christina ;
Brown, Stuart ;
Tang, Zuojian ;
Smith, Phillip R. ;
Hsu, D. Frank .
BMC GENOMICS, 2012, 13
[74]   MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets [J].
Shao, Zhen ;
Zhang, Yijing ;
Yuan, Guo-Cheng ;
Orkin, Stuart H. ;
Waxman, David J. .
GENOME BIOLOGY, 2012, 13 (03)
[75]   CEAS: cis-regulatory element annotation system [J].
Shin, Hyunjin ;
Liu, Tao ;
Manrai, Arjun K. ;
Liu, X. Shirley .
BIOINFORMATICS, 2009, 25 (19) :2605-2606
[76]   Identifying dispersed epigenomic domains from ChIP-Seq data [J].
Song, Qiang ;
Smith, Andrew D. .
BIOINFORMATICS, 2011, 27 (06) :870-871
[77]   BayesPeak: Bayesian analysis of ChIP-seq data [J].
Spyrou, Christiana ;
Stark, Rory ;
Lynch, Andy G. ;
Tavare, Simon .
BMC BIOINFORMATICS, 2009, 10 :299
[78]   Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles [J].
Subramanian, A ;
Tamayo, P ;
Mootha, VK ;
Mukherjee, S ;
Ebert, BL ;
Gillette, MA ;
Paulovich, A ;
Pomeroy, SL ;
Golub, TR ;
Lander, ES ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (43) :15545-15550
[79]   RecMotif: a novel fast algorithm for weak motif discovery [J].
Sun, He Quan ;
Low, Malcolm Yoke Hean ;
Hsu, Wen Jing ;
Rajapakse, Jagath C. .
BMC BIOINFORMATICS, 2010, 11
[80]   Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts [J].
Szalkowski, Adam M. ;
Schmid, Christoph D. .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (06) :626-633