Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data

被引:268
作者
Birol, Inanc [1 ,2 ,3 ]
Raymond, Anthony [1 ]
Jackman, Shaun D. [1 ]
Pleasance, Stephen [1 ]
Coope, Robin [1 ]
Taylor, Greg A. [1 ]
Saint Yuen, Macaire Man [4 ]
Keeling, Christopher I. [4 ]
Brand, Dana [1 ]
Vandervalk, Benjamin P. [1 ]
Kirk, Heather [1 ]
Pandoh, Pawan [1 ]
Moore, Richard A. [1 ]
Zhao, Yongjun [1 ]
Mungall, Andrew J. [1 ]
Jaquish, Barry [5 ]
Yanchuk, Alvin [5 ]
Ritland, Carol [4 ,6 ]
Boyle, Brian [7 ]
Bousquet, Jean [7 ,8 ]
Ritland, Kermit [6 ]
MacKay, John [7 ,8 ]
Bohlmann, Joerg [4 ,6 ]
Jones, Steven J. M. [1 ,2 ,9 ]
机构
[1] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4S6, Canada
[2] Univ British Columbia, Dept Med Genet, Vancouver, BC V6H 3N1, Canada
[3] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[4] Univ British Columbia, Michael Smith Labs, Vancouver, BC V6T 1Z4, Canada
[5] British Columbia Minist Forests,Lands & Nat Resou, Victoria, BC V8W 9C2, Canada
[6] Univ British Columbia, Dept Forest Sci, Vancouver, BC V6T 1Z4, Canada
[7] Univ Laval, Inst Syst & Integrat Biol, Quebec City, PQ G1K 7P4, Canada
[8] Univ Laval, Dept Wood & Forest Sci, Quebec City, PQ G1V 0A6, Canada
[9] Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
关键词
MOUNTAIN PINE-BEETLE; NOVO; IDENTIFICATION; SYNTHASE; GENES;
D O I
10.1093/bioinformatics/btt178
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20 356bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies.
引用
收藏
页码:1492 / 1497
页数:6
相关论文
共 32 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[3]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[4]   Updated genome assembly and annotation of Paenibacillus larvae, the agent of American foulbrood disease of honey bees [J].
Chan, Queenie W. T. ;
Cornman, R. Scott ;
Birol, Inanc ;
Liao, Nancy Y. ;
Chan, Simon K. ;
Docking, T. Roderick ;
Jackman, Shaun D. ;
Taylor, Greg A. ;
Jones, Steven J. M. ;
de Graaf, Dirk C. ;
Evans, Jay D. ;
Foster, Leonard J. .
BMC GENOMICS, 2011, 12
[5]   Genome Sequence of Mycoplasma capricolum subsp capripneumoniae Strain M1601 [J].
Chu, Yuefeng ;
Gao, Pengchen ;
Zhao, Ping ;
He, Ying ;
Liao, Nancy ;
Jackman, Shaun ;
Zhao, Yongjun ;
Birol, Inanc ;
Duan, Xiaobo ;
Lu, Zhongxin .
JOURNAL OF BACTERIOLOGY, 2011, 193 (21) :6098-6099
[6]   ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies [J].
Clark, Scott C. ;
Egan, Rob ;
Frazier, Peter I. ;
Wang, Zhong .
BIOINFORMATICS, 2013, 29 (04) :435-443
[7]   Genome and transcriptome analyses of the mountain pine beetle-fungal symbiont Grosmannia clavigera, a lodgepole pine pathogen [J].
DiGuistini, Scott ;
Wang, Ye ;
Liao, Nancy Y. ;
Taylor, Greg ;
Tanguay, Philippe ;
Feau, Nicolas ;
Henrissat, Bernard ;
Chan, Simon K. ;
Hesse-Orce, Uljana ;
Alamouti, Sepideh Massoumi ;
Tsui, Clement K. M. ;
Docking, Roderick T. ;
Levasseur, Anthony ;
Haridas, Sajeet ;
Robertson, Gordon ;
Birol, Inanc ;
Holt, Robert A. ;
Marra, Marco A. ;
Hamelin, Richard C. ;
Hirst, Martin ;
Jones, Steven J. M. ;
Bohlmann, Joerg ;
Breuil, Colette .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (06) :2504-2509
[8]   De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data [J].
DiGuistini, Scott ;
Liao, Nancy Y. ;
Platt, Darren ;
Robertson, Gordon ;
Seidel, Michael ;
Chan, Simon K. ;
Docking, T. Roderick ;
Birol, Inanc ;
Holt, Robert A. ;
Hirst, Martin ;
Mardis, Elaine ;
Marra, Marco A. ;
Hamelin, Richard C. ;
Bohlmann, Joerg ;
Breuil, Colette ;
Jones, Steven J. M. .
GENOME BIOLOGY, 2009, 10 (09)
[9]   Assemblathon 1: A competitive assessment of de novo short read assembly methods [J].
Earl, Dent ;
Bradnam, Keith ;
St John, John ;
Darling, Aaron ;
Lin, Dawei ;
Fass, Joseph ;
Hung On Ken Yu ;
Buffalo, Vince ;
Zerbino, Daniel R. ;
Diekhans, Mark ;
Ngan Nguyen ;
Ariyaratne, Pramila Nuwantha ;
Sung, Wing-Kin ;
Ning, Zemin ;
Haimel, Matthias ;
Simpson, Jared T. ;
Fonseca, Nuno A. ;
Birol, Inanc ;
Docking, T. Roderick ;
Ho, Isaac Y. ;
Rokhsar, Daniel S. ;
Chikhi, Rayan ;
Lavenier, Dominique ;
Chapuis, Guillaume ;
Naquin, Delphine ;
Maillet, Nicolas ;
Schatz, Michael C. ;
Kelley, David R. ;
Phillippy, Adam M. ;
Koren, Sergey ;
Yang, Shiaw-Pyng ;
Wu, Wei ;
Chou, Wen-Chi ;
Srivastava, Anuj ;
Shaw, Timothy I. ;
Ruby, J. Graham ;
Skewes-Cox, Peter ;
Betegon, Miguel ;
Dimon, Michelle T. ;
Solovyev, Victor ;
Seledtsov, Igor ;
Kosarev, Petr ;
Vorobyev, Denis ;
Ramirez-Gonzalez, Ricardo ;
Leggett, Richard ;
MacLean, Dan ;
Xia, Fangfang ;
Luo, Ruibang ;
Li, Zhenyu ;
Xie, Yinlong .
GENOME RESEARCH, 2011, 21 (12) :2224-2241
[10]  
Ferragina P., 2000, P 41 ANN S FDN COMP