Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

被引:7330
作者
Zerbino, Daniel R. [1 ]
Birney, Ewan [1 ]
机构
[1] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
基金
英国医学研究理事会;
关键词
D O I
10.1101/gr.074492.107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of similar to 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
引用
收藏
页码:821 / 829
页数:9
相关论文
共 28 条
[1]  
[Anonymous], ENCY GENOMICS PROTEO
[2]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[3]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[4]   A parallel graph decomposition algorithm for DNA sequencing with nanopores [J].
Bokhari, SH ;
Sauer, JR .
BIOINFORMATICS, 2005, 21 (07) :889-896
[5]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[6]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[7]  
Gross J.L., 2004, Handbook of Graph Theory, DOI 10.1201/9780203490204
[8]   The atlas genome assembly system [J].
Havlak, P ;
Chen, R ;
Durbin, KJ ;
Egan, A ;
Ren, YR ;
Song, XZ ;
Weinstock, GM ;
Gibbs, RA .
GENOME RESEARCH, 2004, 14 (04) :721-732
[9]   PCAP: A whole-genome assembly program [J].
Huang, XQ ;
Wang, JM ;
Aluru, S ;
Yang, SP ;
Hillier, L .
GENOME RESEARCH, 2003, 13 (09) :2164-2170
[10]  
Idury R M, 1995, J Comput Biol, V2, P291, DOI 10.1089/cmb.1995.2.291