Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data

被引:74
作者
Chung, Dongjun [1 ,2 ]
Kuan, Pei Fen [3 ]
Li, Bo [4 ]
Sanalkumar, Rajendran [5 ]
Liang, Kun [1 ,2 ]
Bresnick, Emery H. [5 ]
Dewey, Colin [2 ,4 ]
Keles, Suenduez [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[2] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC USA
[4] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
[5] Univ Wisconsin, Sch Med & Publ Hlth, Dept Cell & Regenerat Biol, Wisconsin Inst Med Res,UW Carbone Canc Ctr, Madison, WI 53706 USA
基金
美国国家卫生研究院;
关键词
SEGMENTAL DUPLICATIONS; WIDE ANALYSIS; GENE; EVOLUTION; ELEMENTS; BIOINFORMATICS; IDENTIFICATION; ENRICHMENT; STRATEGY; UPDATE;
D O I
10.1371/journal.pcbi.1002111
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with unireads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
引用
收藏
页数:17
相关论文
共 62 条
[21]   Opinion - Transposable elements and the evolution of regulatory networks [J].
Feschotte, Cedric .
NATURE REVIEWS GENETICS, 2008, 9 (05) :397-405
[22]   Discovering Hematopoietic Mechanisms through Genome-wide Analysis of GATA Factor Chromatin Occupancy [J].
Fujiwara, Tohnu ;
O'Geen, Henriette ;
Keles, Sunduz ;
Blahnik, Kimberly ;
Linnemann, Amelia K. ;
Kang, Yoon-A. ;
Choi, Kyunghee ;
Farnham, Peggy J. ;
Bresnick, Emery H. .
MOLECULAR CELL, 2009, 36 (04) :667-681
[23]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[24]   The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility [J].
Gonzalez, E ;
Kulkarni, H ;
Bolivar, H ;
Mangano, A ;
Sanchez, R ;
Catano, G ;
Nibbs, RJ ;
Freedman, BI ;
Quinones, MP ;
Bamshad, MJ ;
Murthy, KK ;
Rovin, BH ;
Bradley, W ;
Clark, RA ;
Anderson, SA ;
O'Connell, RJ ;
Agan, BK ;
Ahuja, SS ;
Bologna, R ;
Sen, L ;
Dolan, MJ ;
Ahuja, SK .
SCIENCE, 2005, 307 (5714) :1434-1440
[25]  
GU JBZ, 2002, SCIENCE, V297, P10053
[26]   Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NATURE PROTOCOLS, 2009, 4 (01) :44-57
[27]   Gene duplication: The genomic trade in spare parts [J].
Hurles, M .
PLOS BIOLOGY, 2004, 2 (07) :900-904
[28]  
Im Hogune, 2004, Methods Mol Biol, V284, P129
[29]   An integrated software system for analyzing ChIP-chip and ChIP-seq data [J].
Ji, Hongkai ;
Jiang, Hui ;
Ma, Wenxiu ;
Johnson, David S. ;
Myers, Richard M. ;
Wong, Wing H. .
NATURE BIOTECHNOLOGY, 2008, 26 (11) :1293-1300
[30]   Genome-wide mapping of in vivo protein-DNA interactions [J].
Johnson, David S. ;
Mortazavi, Ali ;
Myers, Richard M. ;
Wold, Barbara .
SCIENCE, 2007, 316 (5830) :1497-1502