Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph

被引:169
作者
Li, Zhenyu [1 ]
Chen, Yanxiang [1 ]
Mu, Desheng [1 ]
Yuan, Jianying [1 ]
Shi, Yujian [1 ]
Zhang, Hao [1 ]
Gan, Jun [1 ]
Li, Nan [1 ]
Hu, Xuesong [1 ]
Liu, Binghang [1 ]
Yang, Bicheng
Fan, Wei [1 ]
机构
[1] Beijing Genom Inst Shenzhen BGI SZ, Sci & Technol Dept, DNA Sequence Assembly Team, Beijing, Peoples R China
关键词
OLC; DBG; de novo assembly; second-generation; READ ERROR-CORRECTION; DRAFT ASSEMBLIES; SILKWORM BOMBYX; SEQUENCE DATA; PAIRED READS; GENOME; LENGTH;
D O I
10.1093/bfgp/elr035
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Since the completion of the cucumber and panda genome projects using Illumina sequencing in 2009, the global scientific community has had to pay much more attention to this new cost-effective approach to generate the draft sequence of large genomes. To allow new users to more easily understand the assembly algorithms and the optimum software packages for their projects, we make a detailed comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph, from how they match the Lander-Waterman model, to the required sequencing depth and reads length. We also discuss the computational efficiency of each class of algorithm, the influence of repeats and heterozygosity and points of note in the subsequent scaffold linkage and gap closure steps. We hope this review can help further promote the application of second-generation de novo sequencing, as well as aid the future development of assembly algorithms.
引用
收藏
页码:25 / 37
页数:13
相关论文
共 54 条
[1]
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]
Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[3]
[Anonymous], CURR PROTOC BIOINFOR
[4]
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[5]
Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]
Discovering and detecting transposable elements in genome sequences [J].
Bergman, Casey M. ;
Quesneville, Hadi .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (06) :382-392
[7]
Assemblies: the good, the bad, the ugly [J].
Birney, Ewan .
NATURE METHODS, 2011, 8 (01) :59-60
[8]
Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[9]
De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[10]
Finding optimal threshold for correction error reads in DNA assembling [J].
Chin, Francis Y. L. ;
Leung, Henry C. M. ;
Li, Wei-Lin ;
Yiu, Siu-Ming .
BMC BIOINFORMATICS, 2009, 10