Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study

被引:40
作者
Cerdeira, Louise Teixeira
Carneiro, Adriana Ribeiro
Juca Ramos, Rommel Thiago
de Almeida, Sintia Silva [2 ]
D'Afonseca, Vivian [2 ]
Cruz Schneider, Maria Paula
Baumbach, Jan [4 ]
Tauch, Andreas [3 ]
McCulloch, John Anthony
Carvalho Azevedo, Vasco Ariston [2 ]
Silva, Artur [1 ]
机构
[1] Fed Univ Para, Lab Polimorfismo DNA, Inst Ciencias Biol, BR-66075110 Belem, PA, Brazil
[2] Univ Fed Minas Gerais, Inst Ciencias Biol, Belo Horizonte, MG, Brazil
[3] Univ Bielefeld, Inst Genome Res & Syst Biol, Ctr Biotechnol, Germany Inst Genome Res, Bielefeld, Germany
[4] Max Planck Inst Informat, Computat Syst Biol Grp, Saarbrucken, Germany
关键词
De novo; SOLiD; Assembly; Short read; Corynebacterium; Next generation sequencing; SEQUENCE; VELVET;
D O I
10.1016/j.mimet.2011.05.008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de nova strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain 119, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de nova strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:218 / 223
页数:6
相关论文
共 19 条
[1]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[2]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[3]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[4]   De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer [J].
Hernandez, David ;
Francois, Patrice ;
Farinelli, Laurent ;
Osteras, Magne ;
Schrenzel, Jacques .
GENOME RESEARCH, 2008, 18 (05) :802-809
[5]   Crystallizing short-read assemblies around seeds [J].
Hossain, Mohammad Sajjad ;
Azimi, Navid ;
Skiena, Steven .
BMC BIOINFORMATICS, 2009, 10
[6]  
LANDER E S, 1988, Genomics, V2, P231
[7]   The sequence and de novo assembly of the giant panda genome [J].
Li, Ruiqiang ;
Fan, Wei ;
Tian, Geng ;
Zhu, Hongmei ;
He, Lin ;
Cai, Jing ;
Huang, Quanfei ;
Cai, Qingle ;
Li, Bo ;
Bai, Yinqi ;
Zhang, Zhihe ;
Zhang, Yaping ;
Wang, Wen ;
Li, Jun ;
Wei, Fuwen ;
Li, Heng ;
Jian, Min ;
Li, Jianwen ;
Zhang, Zhaolei ;
Nielsen, Rasmus ;
Li, Dawei ;
Gu, Wanjun ;
Yang, Zhentao ;
Xuan, Zhaoling ;
Ryder, Oliver A. ;
Leung, Frederick Chi-Ching ;
Zhou, Yan ;
Cao, Jianjun ;
Sun, Xiao ;
Fu, Yonggui ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Wang, Bo ;
Hou, Rong ;
Shen, Fujun ;
Mu, Bo ;
Ni, Peixiang ;
Lin, Runmao ;
Qian, Wubin ;
Wang, Guodong ;
Yu, Chang ;
Nie, Wenhui ;
Wang, Jinhuan ;
Wu, Zhigang ;
Liang, Huiqing ;
Min, Jiumeng ;
Wu, Qi ;
Cheng, Shifeng ;
Ruan, Jue ;
Wang, Mingwei .
NATURE, 2010, 463 (7279) :311-317
[8]   Next-generation DNA sequencing methods [J].
Mardis, Elaine R. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2008, 9 :387-402
[9]   Assembly algorithms for next-generation sequencing data [J].
Miller, Jason R. ;
Koren, Sergey ;
Sutton, Granger .
GENOMICS, 2010, 95 (06) :315-327
[10]  
NAGARAJAN H, 2010, NOVO ASSEMBLY COMPLE