De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads

被引:65
作者
Farrer, Rhys A. [1 ]
Kemen, Eric [1 ]
Jones, Jonathan D. G. [1 ]
Studholme, David J. [1 ]
机构
[1] Sainsbury Lab, Norwich NR4 7UH, Norfolk, England
关键词
genome sequencing; de novo sequence assembly; Pseudomonas syringae; Bioinformatics; Illumina; Solexa; SHORT DNA-SEQUENCES; MILLIONS;
D O I
10.1111/j.1574-6968.2008.01441.x
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled > 96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. edena generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. ssake and vcakeyielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while > 90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 x deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.
引用
收藏
页码:103 / 111
页数:9
相关论文
共 15 条
  • [1] A Draft Genome Sequence of Pseudomonas syringae pv. tomato T1 Reveals a Type III Effector Repertoire Significantly Divergent from That of Pseudomonas syringae pv. tomato DC3000
    Almeida, Nalvo F.
    Yan, Shuangchun
    Lindeberg, Magdalen
    Studholme, David J.
    Schneider, David J.
    Condon, Bradford
    Liu, Haijie
    Viana, Carlos J.
    Warren, Andrew
    Evans, Clive
    Kemen, Eric
    MacLean, Dan
    Angot, Aurelie
    Martin, Gregory B.
    Jones, Jonathan D.
    Collmer, Alan
    Setubal, Joao C.
    Vinatzer, Boris A.
    [J]. MOLECULAR PLANT-MICROBE INTERACTIONS, 2009, 22 (01) : 52 - 62
  • [2] ALLPATHS: De novo assembly of whole-genome shotgun microreads
    Butler, Jonathan
    MacCallum, Iain
    Kleber, Michael
    Shlyakhter, Ilya A.
    Belmonte, Matthew K.
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 810 - 820
  • [3] Short read fragment assembly of bacterial genomes
    Chaisson, Mark J.
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2008, 18 (02) : 324 - 330
  • [4] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [5] SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. GENOME RESEARCH, 2007, 17 (11) : 1697 - 1706
  • [6] Comparison of the complete genome sequences of Pseudomonas syringae pv. syringae B728a and pv. tomato DC3000
    Feil, H
    Feil, WS
    Chain, P
    Larimer, F
    DiBartolo, G
    Copeland, A
    Lykidis, A
    Trong, S
    Nolan, M
    Goltsman, E
    Thiel, J
    Malfatti, S
    Loper, JE
    Lapidus, A
    Detter, JC
    Land, M
    Richardson, PM
    Kyrpides, NC
    Ivanova, N
    Lindow, SE
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (31) : 11064 - 11069
  • [7] De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer
    Hernandez, David
    Francois, Patrice
    Farinelli, Laurent
    Osteras, Magne
    Schrenzel, Jacques
    [J]. GENOME RESEARCH, 2008, 18 (05) : 802 - 809
  • [8] The new paradigm of flow cell sequencing
    Holt, Robert A.
    Jones, Steven J. M.
    [J]. GENOME RESEARCH, 2008, 18 (06) : 839 - 846
  • [9] Extending assembly of short DNA sequences to handle error
    Jeck, William R.
    Reinhardt, Josephine A.
    Baltrus, David A.
    Hickenbotham, Matthew T.
    Magrini, Vincent
    Mardis, Elaine R.
    Dangl, Jeffery L.
    Jones, Corbin D.
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2942 - 2944
  • [10] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202. Article published online before March 2002, 10.1101/gr.229202]