Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data

被引:74
作者
Chung, Dongjun [1 ,2 ]
Kuan, Pei Fen [3 ]
Li, Bo [4 ]
Sanalkumar, Rajendran [5 ]
Liang, Kun [1 ,2 ]
Bresnick, Emery H. [5 ]
Dewey, Colin [2 ,4 ]
Keles, Suenduez [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[2] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC USA
[4] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
[5] Univ Wisconsin, Sch Med & Publ Hlth, Dept Cell & Regenerat Biol, Wisconsin Inst Med Res,UW Carbone Canc Ctr, Madison, WI 53706 USA
基金
美国国家卫生研究院;
关键词
SEGMENTAL DUPLICATIONS; WIDE ANALYSIS; GENE; EVOLUTION; ELEMENTS; BIOINFORMATICS; IDENTIFICATION; ENRICHMENT; STRATEGY; UPDATE;
D O I
10.1371/journal.pcbi.1002111
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with unireads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
引用
收藏
页数:17
相关论文
共 62 条
[1]   GeneTrack - a genomic data processing and visualization framework [J].
Albert, Istvan ;
Wachi, Shinichiro ;
Jiang, Cizhong ;
Pugh, Franklin .
BIOINFORMATICS, 2008, 24 (10) :1305-1306
[2]  
[Anonymous], 2010, RepeatMasker Open-3.0. 1996-2010
[3]   Mapping accessible chromatin regions using Sono-Seq [J].
Auerbach, Raymond K. ;
Euskirchen, Ghia ;
Rozowsky, Joel ;
Lamarre-Vincent, Nathan ;
Moqtaderi, Zarmik ;
Lefrancois, Philippe ;
Struhl, Kevin ;
Gerstein, Mark ;
Snyder, Michael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (35) :14926-14931
[4]   Segmental duplications: Organization and impact within the current Human Genome Project assembly [J].
Bailey, JA ;
Yavor, AM ;
Massa, HF ;
Trask, BJ ;
Eichler, EE .
GENOME RESEARCH, 2001, 11 (06) :1005-1017
[5]   Primate segmental duplications: crucibles of evolution, diversity and disease [J].
Bailey, Jeffrey A. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2006, 7 (07) :552-564
[6]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[7]  
Bailey TL., 1994, Proc Int Conf Intel Syst Mol Biol, V2, P28
[8]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[9]   Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data [J].
Blahnik, Kimberly R. ;
Dou, Lei ;
O'Geen, Henriette ;
McPhillips, Timothy ;
Xu, Xiaoqin ;
Cao, Alina R. ;
Iyengar, Sushma ;
Nicolet, Charles M. ;
Ludaescher, Bertram ;
Korf, Ian ;
Farnham, Peggy J. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (03) :e13.1-e13.17
[10]   Evolution of the mammalian transcription factor binding repertoire via transposable elements [J].
Bourque, Guillaume ;
Leong, Bernard ;
Vega, Vinsensius B. ;
Chen, Xi ;
Lee, Yen Ling ;
Srinivasan, Kandhadayar G. ;
Chew, Joon-Lin ;
Ruan, Yijun ;
Wei, Chia-Lin ;
Ng, Huck Hui ;
Liu, Edison T. .
GENOME RESEARCH, 2008, 18 (11) :1752-1762