Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim

被引:96
作者
Balzer, Susanne [1 ,2 ]
Malde, Ketil [1 ]
Lanzen, Anders [3 ,4 ]
Sharma, Animesh [2 ]
Jonassen, Inge [2 ,3 ]
机构
[1] Inst Marine Res, N-5817 Bergen, Norway
[2] Univ Bergen, Dept Informat, N-5020 Bergen, Norway
[3] Bergen Ctr Computat Sci, Computat Biol Unit, N-5008 Bergen, Norway
[4] Univ Bergen, Dept Biol, N-5020 Bergen, Norway
关键词
QUALITY;
D O I
10.1093/bioinformatics/btq365
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to similar to 500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields.
引用
收藏
页码:i420 / i425
页数:6
相关论文
共 16 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] The complete genome sequence of Escherichia coli K-12
    Blattner, FR
    Plunkett, G
    Bloch, CA
    Perna, NT
    Burland, V
    Riley, M
    ColladoVides, J
    Glasner, JD
    Rode, CK
    Mayhew, GF
    Gregor, J
    Davis, NW
    Kirkpatrick, HA
    Goeden, MA
    Rose, DJ
    Mau, B
    Shao, Y
    [J]. SCIENCE, 1997, 277 (5331) : 1453 - +
  • [3] Quality scores and SNP detection in sequencing-by-synthesis systems
    Brockman, William
    Alvarez, Pablo
    Young, Sarah
    Garber, Manuel
    Giannoukos, Georgia
    Lee, William L.
    Russ, Carsten
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 763 - 770
  • [4] ENGLE ML, 1994, COMPUT APPL BIOSCI, V10, P567
  • [5] Systematic artifacts in metagenomes from complex microbial communities
    Gomez-Alvarez, Vicente
    Teal, Tracy K.
    Schmidt, Thomas M.
    [J]. ISME JOURNAL, 2009, 3 (11) : 1314 - 1317
  • [6] Accuracy and quality of massively parallel DNA pyrosequencing
    Huse, Susan M.
    Huber, Julie A.
    Morrison, Hilary G.
    Sogin, Mitchell L.
    Mark Welch, David
    [J]. GENOME BIOLOGY, 2007, 8 (07)
  • [7] MEGAN analysis of metagenomic data
    Huson, Daniel H.
    Auch, Alexander F.
    Qi, Ji
    Schuster, Stephan C.
    [J]. GENOME RESEARCH, 2007, 17 (03) : 377 - 386
  • [8] The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing
    Kuhl, Heiner
    Beck, Alfred
    Wozniak, Grzegorz
    Canario, Adelino V. M.
    Volckaert, Filip A. M.
    Reinhardt, Richard
    [J]. BMC GENOMICS, 2010, 11
  • [9] Genome sequencing in microfabricated high-density picolitre reactors
    Margulies, M
    Egholm, M
    Altman, WE
    Attiya, S
    Bader, JS
    Bemben, LA
    Berka, J
    Braverman, MS
    Chen, YJ
    Chen, ZT
    Dewell, SB
    Du, L
    Fierro, JM
    Gomes, XV
    Godwin, BC
    He, W
    Helgesen, S
    Ho, CH
    Irzyk, GP
    Jando, SC
    Alenquer, MLI
    Jarvie, TP
    Jirage, KB
    Kim, JB
    Knight, JR
    Lanza, JR
    Leamon, JH
    Lefkowitz, SM
    Lei, M
    Li, J
    Lohman, KL
    Lu, H
    Makhijani, VB
    McDade, KE
    McKenna, MP
    Myers, EW
    Nickerson, E
    Nobile, JR
    Plant, R
    Puc, BP
    Ronan, MT
    Roth, GT
    Sarkis, GJ
    Simons, JF
    Simpson, JW
    Srinivasan, M
    Tartaro, KR
    Tomasz, A
    Vogt, KA
    Volkmer, GA
    [J]. NATURE, 2005, 437 (7057) : 376 - 380
  • [10] Aggressive assembly of pyrosequencing reads with mates
    Miller, Jason R.
    Delcher, Arthur L.
    Koren, Sergey
    Venter, Eli
    Walenz, Brian P.
    Brownley, Anushka
    Johnson, Justin
    Li, Kelvin
    Mobarry, Clark
    Sutton, Granger
    [J]. BIOINFORMATICS, 2008, 24 (24) : 2818 - 2824