Characterizing and measuring bias in sequence data

被引:638
作者
Ross, Michael G. [1 ]
Russ, Carsten [1 ]
Costello, Maura [1 ]
Hollinger, Andrew [1 ]
Lennon, Niall J. [1 ]
Hegarty, Ryan [1 ]
Nusbaum, Chad [1 ]
Jaffe, David B. [1 ]
机构
[1] 7 Cambridge Ctr, Broad Inst, Cambridge, MA 02142 USA
基金
美国国家卫生研究院;
关键词
BURROWS-WHEELER TRANSFORM; DNA-POLYMERASE; PACIFIC BIOSCIENCES; VARIATION DISCOVERY; BACTERIAL GENOMES; READ ALIGNMENT; ION TORRENT; PLATFORMS; FIDELITY; AMPLIFICATION;
D O I
10.1186/gb-2013-14-5-r51
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Background: DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias. Results: We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120- fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage. Conclusions: The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.
引用
收藏
页数:20
相关论文
共 49 条
[1]
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]
A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]
A hybrid approach for the automated finishing of bacterial genomes [J].
Bashir, Ali ;
Klammer, Aaron A. ;
Robins, William P. ;
Chin, Chen-Shan ;
Webster, Dale ;
Paxinos, Ellen ;
Hsu, David ;
Ashby, Meredith ;
Wang, Susana ;
Peluso, Paul ;
Sebra, Robert ;
Sorenson, Jon ;
Bullard, James ;
Yen, Jackie ;
Valdovino, Marie ;
Mollova, Emilia ;
Luong, Khai ;
Lin, Steven ;
Lamay, Brianna ;
Joshi, Amruta ;
Rowe, Lori ;
Frace, Michael ;
Tarr, Cheryl L. ;
Turnsek, Maryann ;
Davis, Brigid M. ;
Kasarskis, Andrew ;
Mekalanos, John J. ;
Waldor, Matthew K. ;
Schadt, Eric E. .
NATURE BIOTECHNOLOGY, 2012, 30 (07) :701-+
[4]
Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[5]
Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]
FIDELITY OF THERMOCOCCUS-LITORALIS DNA-POLYMERASE (VENT) IN PCR DETERMINED BY DENATURING GRADIENT GEL-ELECTROPHORESIS [J].
CARIELLO, NF ;
SWENBERG, JA ;
SKOPEK, TR .
NUCLEIC ACIDS RESEARCH, 1991, 19 (15) :4193-4198
[7]
Pacific biosciences sequencing technology for genotyping and variation discovery in human data [J].
Carneiro, Mauricio O. ;
Russ, Carsten ;
Ross, Michael G. ;
Gabriel, Stacey B. ;
Nusbaum, Chad ;
DePristo, Mark A. .
BMC GENOMICS, 2012, 13
[8]
Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads [J].
Carnevali, Paolo ;
Baccash, Jonathan ;
Halpern, Aaron L. ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Pant, Krishna P. ;
Ebert, Jessica C. ;
Brownley, Anushka ;
Morenzoni, Matt ;
Karpinchyk, Vitali ;
Martin, Bruce ;
Ballinger, Dennis G. ;
Drmanac, Radoje .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (03) :279-292
[9]
Representation of cloned genomic sequences in two sequencing vectors: Correlation of DNA sequence and subclone distribution [J].
Chissoe, SL ;
Marra, MA ;
Hillier, L ;
Brinkman, R ;
Wilson, RK ;
Waterston, RH .
NUCLEIC ACIDS RESEARCH, 1997, 25 (15) :2960-2966
[10]
PCR fidelity of Pfu DNA polymerase and other thermostable DNA polymerases [J].
Cline, J ;
Braman, JC ;
Hogrefe, HH .
NUCLEIC ACIDS RESEARCH, 1996, 24 (18) :3546-3551