Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies

被引:386
作者
Boisvert, Sebastien [1 ,3 ]
Laviolette, Francois [2 ]
Corbeil, Jacques [1 ,3 ]
机构
[1] Univ Laval, Dept Mol Med, Quebec City, PQ G1V 0A6, Canada
[2] Univ Laval, Dept Informat & Genie Logiciel, Quebec City, PQ G1V 0A6, Canada
[3] Ctr Rech CHUQ, Quebec City, PQ, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
de Bruijn graphs; genome assembly; high-throughput sequencing; SHORT DNA-SEQUENCES; GENOME SEQUENCE; GENERATION; VERSATILE; MILLIONS;
D O I
10.1089/cmb.2009.0238
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An accurate genome sequence of a desired species is now a pre-requisite for genome research. An important step in obtaining a high-quality genome sequence is to correctly assemble short reads into longer sequences accurately representing contiguous genomic regions. Current sequencing technologies continue to offer increases in throughput, and corresponding reductions in cost and time. Unfortunately, the benefit of obtaining a large number of reads is complicated by sequencing errors, with different biases being observed with each platform. Although software are available to assemble reads for each individual system, no procedure has been proposed for high-quality simultaneous assembly based on reads from a mix of different technologies. In this paper, we describe a parallel short-read assembler, called Ray, which has been developed to assemble reads obtained from a combination of sequencing platforms. We compared its performance to other assemblers on simulated and real datasets. We used a combination of Roche/454 and Illumina reads to assemble three different genomes. We showed that mixing sequencing technologies systematically reduces the number of contigs and the number of errors. Because of its open nature, this new tool will hopefully serve as a basis to develop an assembler that can be of universal utilization (availability: http://deNovoAssembler.sf.Net/). For online Supplementary Material, see www.liebertonline.com.
引用
收藏
页码:1519 / 1533
页数:15
相关论文
共 38 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies [J].
Aury, Jean-Marc ;
Cruaud, Corinne ;
Barbe, Valerie ;
Rogier, Odile ;
Mangenot, Sophie ;
Samson, Gaelle ;
Poulain, Julie ;
Anthouard, Veronique ;
Scarpelli, Claude ;
Artiguenave, Francois ;
Wincker, Patrick .
BMC GENOMICS, 2008, 9 (1)
[3]   Unique features revealed by the genome sequence of Acinetobacter sp ADP1, a versatile and naturally transformation competent bacterium [J].
Barbe, V ;
Vallenet, D ;
Fonknechten, N ;
Kreimeyer, A ;
Oztas, S ;
Labarre, L ;
Cruveiller, S ;
Robert, C ;
Duprat, S ;
Wincker, P ;
Ornston, LN ;
Weissenbach, J ;
Marlière, P ;
Cohen, GN ;
Médigue, C .
NUCLEIC ACIDS RESEARCH, 2004, 32 (19) :5766-5779
[4]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[7]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[8]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[9]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[10]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346