An Integrated Pipeline for de Novo Assembly of Microbial Genomes

被引:346
作者
Tritt, Andrew [1 ]
Eisen, Jonathan A. [1 ,2 ,3 ]
Facciotti, Marc T. [1 ,4 ]
Darling, Aaron E. [1 ]
机构
[1] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Ecol & Evolut, Davis, CA 95616 USA
[3] Univ Calif Davis, Dept Med Microbiol & Immunol, Davis, CA 95616 USA
[4] Univ Calif Davis, Dept Biomed Engn, Davis, CA 95616 USA
来源
PLOS ONE | 2012年 / 7卷 / 09期
基金
美国国家科学基金会;
关键词
SEQUENCING DATA; ALGORITHM; REARRANGEMENTS; CONSTRUCTION; ALIGNMENT; MAUVE; TOOL;
D O I
10.1371/journal.pone.0042304
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
引用
收藏
页数:9
相关论文
共 35 条
[1]   Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition [J].
Adey, Andrew ;
Morrison, Hilary G. ;
Asan ;
Xun, Xu ;
Kitzman, Jacob O. ;
Turner, Emily H. ;
Stackhouse, Bethany ;
MacKenzie, Alexandra P. ;
Caruccio, Nicholas C. ;
Zhang, Xiuqing ;
Shendure, Jay .
GENOME BIOLOGY, 2010, 11 (12)
[2]  
[Anonymous], P 23 INT C VER LARG
[3]  
Bergeron A, 2006, LECT NOTES COMPUT SC, V4175, P163
[4]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[5]   Fast identification and statistical evaluation of segmental homologies in comparative maps [J].
Calabrese, Peter P. ;
Chakravarty, Sugata ;
Vision, Todd J. .
BIOINFORMATICS, 2003, 19 :i74-i80
[6]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[7]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[8]   Mauve Assembly Metrics [J].
Darling, Aaron E. ;
Tritt, Andrew ;
Eisen, Jonathan A. ;
Facciotti, Marc T. .
BIOINFORMATICS, 2011, 27 (19) :2756-2757
[9]   Mauve: Multiple alignment of conserved genomic sequence with rearrangements [J].
Darling, ACE ;
Mau, B ;
Blattner, FR ;
Perna, NT .
GENOME RESEARCH, 2004, 14 (07) :1394-1403
[10]   SOPRA: Scaffolding algorithm for paired reads via statistical optimization [J].
Dayarian, Adel ;
Michael, Todd P. ;
Sengupta, Anirvan M. .
BMC BIOINFORMATICS, 2010, 11