Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

被引:148
作者
Bailey, Timothy [1 ]
Krajewski, Pawel [2 ]
Ladunga, Istvan [3 ]
Lefebvre, Celine [4 ]
Li, Qunhua [5 ]
Liu, Tao [6 ]
Madrigal, Pedro [2 ]
Taslim, Cenny [7 ]
Zhang, Jie [7 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld, Australia
[2] Polish Acad Sci, Dept Biometry & Bioinformat, Inst Plant Genet, Poznan, Poland
[3] Univ Nebraska, Dept Stat, Beadle Ctr, Lincoln, NE USA
[4] Canc Inst Gustave Roussy, INSERM, U981, Villejuif, France
[5] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[6] SUNY Buffalo, Dept Biochem, Buffalo, NY 14214 USA
[7] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
基金
美国国家卫生研究院;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; GENOME-WIDE IDENTIFICATION; DNA BINDING-SITES; R-PACKAGE; BIOCONDUCTOR PACKAGE; NORMALIZATION; ALGORITHM; ALIGNMENT; SOFTWARE; PLATFORM;
D O I
10.1371/journal.pcbi.1003326
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
引用
收藏
页数:8
相关论文
共 99 条
[1]   Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq [J].
Adli, Mazhar ;
Bernstein, Bradley E. .
NATURE PROTOCOLS, 2011, 6 (10) :1656-1668
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   MULTOVL: fast multiple overlaps of genomic regions [J].
Aszodi, Andras .
BIOINFORMATICS, 2012, 28 (24) :3318-3319
[4]   Inferring direct DNA binding from ChIP-seq [J].
Bailey, Timothy L. ;
Machanick, Philip .
NUCLEIC ACIDS RESEARCH, 2012, 40 (17) :e128
[5]   A computational pipeline for comparative ChIP-seq analyses [J].
Bardet, Anais F. ;
He, Qiye ;
Zeitlinger, Julia ;
Stark, Alexander .
NATURE PROTOCOLS, 2012, 7 (01) :45-61
[6]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[7]   RNA Pol II Accumulates at Promoters of Growth Genes During Developmental Arrest [J].
Baugh, L. Ryan ;
DeModena, John ;
Sternberg, Paul W. .
SCIENCE, 2009, 324 (5923) :92-94
[8]  
Blankenberg Daniel, 2010, Curr Protoc Mol Biol, VChapter 19, DOI 10.1002/0471142727.mb1910s89
[9]  
Chen YW, 2012, NAT METHODS, V9, P609, DOI [10.1038/NMETH.1985, 10.1038/nmeth.1985]
[10]   Systematic bias in high-throughput sequencing data and its correction by BEADS [J].
Cheung, Ming-Sin ;
Down, Thomas A. ;
Latorre, Isabel ;
Ahringer, Julie .
NUCLEIC ACIDS RESEARCH, 2011, 39 (15) :e103