Systematic exploration of error sources in pyrosequencing flowgram data

被引:65
作者
Balzer, Susanne [1 ,2 ]
Malde, Ketil [1 ]
Jonassen, Inge [2 ,3 ]
机构
[1] Inst Marine Res, N-5817 Bergen, Norway
[2] Univ Bergen, Dept Informat, N-5020 Bergen, Norway
[3] Uni Comp, Computat Biol Unit, N-5008 Bergen, Norway
关键词
QUALITY;
D O I
10.1093/bioinformatics/btr251
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. Results: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim.
引用
收藏
页码:I304 / I309
页数:6
相关论文
共 16 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim [J].
Balzer, Susanne ;
Malde, Ketil ;
Lanzen, Anders ;
Sharma, Animesh ;
Jonassen, Inge .
BIOINFORMATICS, 2010, 26 (18) :i420-i425
[3]   DNA sequence quality trimming and vector removal [J].
Chou, HH ;
Holmes, MH .
BIOINFORMATICS, 2001, 17 (12) :1093-1104
[4]   Systematic artifacts in metagenomes from complex microbial communities [J].
Gomez-Alvarez, Vicente ;
Teal, Tracy K. ;
Schmidt, Thomas M. .
ISME JOURNAL, 2009, 3 (11) :1314-1317
[5]   Evaluation of next generation sequencing platforms for population targeted sequencing studies [J].
Harismendy, Olivier ;
Ng, Pauline C. ;
Strausberg, Robert L. ;
Wang, Xiaoyun ;
Stockwell, Timothy B. ;
Beeson, Karen Y. ;
Schork, Nicholas J. ;
Murray, Sarah S. ;
Topol, Eric J. ;
Levy, Samuel ;
Frazer, Kelly A. .
GENOME BIOLOGY, 2009, 10 (03)
[6]   The effect of sequencing errors on metagenomic gene prediction [J].
Hoff, Katharina J. .
BMC GENOMICS, 2009, 10
[7]   Accuracy and quality of massively parallel DNA pyrosequencing [J].
Huse, Susan M. ;
Huber, Julie A. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. ;
Mark Welch, David .
GENOME BIOLOGY, 2007, 8 (07)
[8]   The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing [J].
Kuhl, Heiner ;
Beck, Alfred ;
Wozniak, Grzegorz ;
Canario, Adelino V. M. ;
Volckaert, Filip A. M. ;
Reinhardt, Richard .
BMC GENOMICS, 2010, 11
[9]   Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates [J].
Kunin, Victor ;
Engelbrektson, Anna ;
Ochman, Howard ;
Hugenholtz, Philip .
ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (01) :118-123
[10]   Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences [J].
Li, Weizhong ;
Godzik, Adam .
BIOINFORMATICS, 2006, 22 (13) :1658-1659