Systematic bias in high-throughput sequencing data and its correction by BEADS

被引:113
作者
Cheung, Ming-Sin
Down, Thomas A.
Latorre, Isabel
Ahringer, Julie [1 ]
机构
[1] Univ Cambridge, Gurdon Inst, Cambridge CB2 1QN, England
基金
英国惠康基金;
关键词
CHIP-SEQ DATA; GENOME; IDENTIFICATION; SOFTWARE; GENES; SITES;
D O I
10.1093/nar/gkr425
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.
引用
收藏
页数:9
相关论文
共 29 条
[1]   Mapping accessible chromatin regions using Sono-Seq [J].
Auerbach, Raymond K. ;
Euskirchen, Ghia ;
Rozowsky, Joel ;
Lamarre-Vincent, Nathan ;
Moqtaderi, Zarmik ;
Lefrancois, Philippe ;
Struhl, Kevin ;
Gerstein, Mark ;
Snyder, Michael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (35) :14926-14931
[2]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[3]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[4]   X chromosome repression by localization of the C-elegans dosage compensation machinery to sites of transcription initiation [J].
Ercan, Sevinc ;
Giresi, Paul G. ;
Whittle, Christina M. ;
Zhang, Xinmin ;
Green, Roland D. ;
Lieb, Jason D. .
NATURE GENETICS, 2007, 39 (03) :403-408
[5]   Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project [J].
Gerstein, Mark B. ;
Lu, Zhi John ;
Van Nostrand, Eric L. ;
Cheng, Chao ;
Arshinoff, Bradley I. ;
Liu, Tao ;
Yip, Kevin Y. ;
Robilotto, Rebecca ;
Rechtsteiner, Andreas ;
Ikegami, Kohta ;
Alves, Pedro ;
Chateigner, Aurelien ;
Perry, Marc ;
Morris, Mitzi ;
Auerbach, Raymond K. ;
Feng, Xin ;
Leng, Jing ;
Vielle, Anne ;
Niu, Wei ;
Rhrissorrakrai, Kahn ;
Agarwal, Ashish ;
Alexander, Roger P. ;
Barber, Galt ;
Brdlik, Cathleen M. ;
Brennan, Jennifer ;
Brouillet, Jeremy Jean ;
Carr, Adrian ;
Cheung, Ming-Sin ;
Clawson, Hiram ;
Contrino, Sergio ;
Dannenberg, Luke O. ;
Dernburg, Abby F. ;
Desai, Arshad ;
Dick, Lindsay ;
Dose, Andrea C. ;
Du, Jiang ;
Egelhofer, Thea ;
Ercan, Sevinc ;
Euskirchen, Ghia ;
Ewing, Brent ;
Feingold, Elise A. ;
Gassmann, Reto ;
Good, Peter J. ;
Green, Phil ;
Gullier, Francois ;
Gutwein, Michelle ;
Guyer, Mark S. ;
Habegger, Lukas ;
Han, Ting ;
Henikoff, Jorja G. .
SCIENCE, 2010, 330 (6012) :1775-1787
[6]   Evaluation of next generation sequencing platforms for population targeted sequencing studies [J].
Harismendy, Olivier ;
Ng, Pauline C. ;
Strausberg, Robert L. ;
Wang, Xiaoyun ;
Stockwell, Timothy B. ;
Beeson, Karen Y. ;
Schork, Nicholas J. ;
Murray, Sarah S. ;
Topol, Eric J. ;
Levy, Samuel ;
Frazer, Kelly A. .
GENOME BIOLOGY, 2009, 10 (03)
[7]   An integrated software system for analyzing ChIP-chip and ChIP-seq data [J].
Ji, Hongkai ;
Jiang, Hui ;
Ma, Wenxiu ;
Johnson, David S. ;
Myers, Richard M. ;
Wong, Wing H. .
NATURE BIOTECHNOLOGY, 2008, 26 (11) :1293-1300
[8]   Design and analysis of ChIP-seq experiments for DNA-binding proteins [J].
Kharchenko, Peter V. ;
Tolstorukov, Michael Y. ;
Park, Peter J. .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1351-1359
[9]   Differential chromatin marking of introns and expressed exons by H3K36me3 [J].
Kolasinska-Zwierz, Paulina ;
Down, Thomas ;
Latorre, Isabel ;
Liu, Tao ;
Liu, X. Shirley ;
Ahringer, Julie .
NATURE GENETICS, 2009, 41 (03) :376-381
[10]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858