Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

被引:161
作者
Chen, Yen-Chun [1 ]
Liu, Tsunglin [2 ]
Yu, Chun-Hui [1 ]
Chiang, Tzen-Yuh [3 ]
Hwang, Chi-Chuan [1 ,4 ]
机构
[1] Natl Cheng Kung Univ, Dept Engn Sci, Tainan 70101, Taiwan
[2] Natl Cheng Kung Univ, Inst Bioinformat & Biosignal Transduct, Tainan 70101, Taiwan
[3] Natl Cheng Kung Univ, Dept Life Sci, Tainan 70101, Taiwan
[4] Natl Cheng Kung Univ, Supercomp Res Ctr, Tainan 70101, Taiwan
来源
PLOS ONE | 2013年 / 8卷 / 04期
关键词
READ DATA SETS; LIBRARY PREPARATION; PCR AMPLIFICATION; DNA-SEQUENCES; TECHNOLOGIES; ALGORITHMS; CHALLENGES; EFFICIENT; ALLPATHS; REPEATS;
D O I
10.1371/journal.pone.0062856
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
引用
收藏
页数:20
相关论文
共 42 条
[1]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]   Amplification efficiency of thermostable DNA polymerases [J].
Arezi, B ;
Xing, WM ;
Sorge, JA ;
Hogrefe, HH .
ANALYTICAL BIOCHEMISTRY, 2003, 321 (02) :226-235
[3]   Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[6]   Efficient de novo assembly of single-cell bacterial genomes from short-read data sets [J].
Chitsaz, Hamidreza ;
Yee-Greenbaum, Joyclyn L. ;
Tesler, Glenn ;
Lombardo, Mary-Jane ;
Dupont, Christopher L. ;
Badger, Jonathan H. ;
Novotny, Mark ;
Rusch, Douglas B. ;
Fraser, Louise J. ;
Gormley, Niall A. ;
Schulz-Trieglaff, Ole ;
Smith, Geoffrey P. ;
Evers, Dirk J. ;
Pevzner, Pavel A. ;
Lasken, Roger S. .
NATURE BIOTECHNOLOGY, 2011, 29 (10) :915-U214
[7]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[8]   PILER: identification and classification of genomic repeats [J].
Edgar, RC ;
Myers, EW .
BIOINFORMATICS, 2005, 21 :I152-I158
[9]   A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries [J].
Fisher, Sheila ;
Barry, Andrew ;
Abreu, Justin ;
Minie, Brian ;
Nolan, Jillian ;
Delorey, Toni M. ;
Young, Geneva ;
Fennell, Timothy J. ;
Allen, Alexander ;
Ambrogio, Lauren ;
Berlin, Aaron M. ;
Blumenstiel, Brendan ;
Cibulskis, Kristian ;
Friedrich, Dennis ;
Johnson, Ryan ;
Juhn, Frank ;
Reilly, Brian ;
Shammas, Ramy ;
Stalker, John ;
Sykes, Sean M. ;
Thompson, Jon ;
Walsh, John ;
Zimmer, Andrew ;
Zwirko, Zac ;
Gabriel, Stacey ;
Nicol, Robert ;
Nusbaum, Chad .
GENOME BIOLOGY, 2011, 12 (01)
[10]   Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation [J].
Golovko, Georgiy ;
Khanipov, Kamil ;
Rojas, Mark ;
Martinez-Alcantara, Antonio ;
Howard, Jesse J. ;
Ballesteros, Efren ;
Gupta, Sharu ;
Widger, William ;
Fofanov, Yuriy .
BMC BIOINFORMATICS, 2012, 13