Crystallizing short-read assemblies around seeds

被引:22
作者
Hossain, Mohammad Sajjad [1 ]
Azimi, Navid [1 ]
Skiena, Steven [1 ]
机构
[1] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
ALGORITHM;
D O I
10.1186/1471-2105-10-S1-S16
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: New short-read sequencing technologies produce enormous volumes of 25- 30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this paper, we present a short-read de novo assembler particularly targeted at the new ABI SOLiD sequencing technology. Results: This paper presents what we believe to be the first de novo sequence assembly results on real data from the emerging SOLiD platform, introduced by Applied Biosystems. Our assembler SHORTY augments short-paired reads using a trivially small number (5-10) of seeds of length 300 - 500 bp. These seeds enable us to produce significant assemblies using short-read coverage no more than 100x, which can be obtained in a single run of these high-capacity sequencers. SHORTY exploits two ideas which we believe to be of interest to the short-read assembly community: (1) using single seed reads to crystallize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. Conclusion: We demonstrate effective assemblies (N50 contig sizes similar to 40 kb) of three different bacterial species using simulated SOLiD data. Sequencing artifacts limit our performance on real data, however our results on this data are substantially better than those achieved by competing assemblers.
引用
收藏
页数:12
相关论文
共 28 条
[1]   Sequence information can be obtained from single DNA molecules [J].
Braslavsky, I ;
Hebert, B ;
Kartalov, E ;
Quake, SR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (07) :3960-3964
[2]   Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays [J].
Brenner, S ;
Johnson, M ;
Bridgham, J ;
Golda, G ;
Lloyd, DH ;
Johnson, D ;
Luo, SJ ;
McCurdy, S ;
Foy, M ;
Ewan, M ;
Roth, R ;
George, D ;
Eletr, S ;
Albrecht, G ;
Vermaas, E ;
Williams, SR ;
Moon, K ;
Burcham, T ;
Pallas, M ;
DuBridge, RB ;
Kirchner, J ;
Fearon, K ;
Mao, J ;
Corcoran, K .
NATURE BIOTECHNOLOGY, 2000, 18 (06) :630-634
[3]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[4]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[5]  
CHAISSON M, SHORT READ IN PRESS
[6]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[7]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[8]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[9]   Single-molecule DNA sequencing of a viral genome [J].
Harris, Timothy D. ;
Buzby, Phillip R. ;
Babcock, Hazen ;
Beer, Eric ;
Bowers, Jayson ;
Braslavsky, Ido ;
Causey, Marie ;
Colonell, Jennifer ;
DiMeo, James ;
Efcavitch, J. William ;
Giladi, Eldar ;
Gill, Jaime ;
Healy, John ;
Jarosz, Mirna ;
Lapen, Dan ;
Moulton, Keith ;
Quake, Stephen R. ;
Steinmann, Kathleen ;
Thayer, Edward ;
Tyurina, Anastasia ;
Ward, Rebecca ;
Weiss, Howard ;
Xie, Zheng .
SCIENCE, 2008, 320 (5872) :106-109
[10]  
HERNANDEZ D, 2008, GENOME RES