Reference-guided assembly of four diverse Arabidopsis thaliana genomes

被引:183
作者
Schneeberger, Korbinian [1 ,2 ]
Ossowski, Stephan [1 ,3 ,4 ]
Ott, Felix [1 ]
Klein, Juliane D. [5 ]
Wang, Xi [1 ]
Lanz, Christa [1 ]
Smith, Lisa M. [1 ]
Cao, Jun [1 ]
Fitz, Joffrey [1 ]
Warthmann, Norman [1 ]
Henz, Stefan R. [1 ]
Huson, Daniel H. [5 ]
Weigel, Detlef [1 ]
机构
[1] Max Planck Inst Dev Biol, Dept Mol Biol, D-72076 Tubingen, Germany
[2] Max Planck Inst Plant Breeding Res, Dept Plant Dev Biol, D-50829 Cologne, Germany
[3] UPF, Barcelona 08003, Spain
[4] CRG, Genes & Dis Program, Genom & Epigen Variat Dis Grp, Barcelona 08003, Spain
[5] Univ Tubingen, Ctr Bioinformat Tubingen, D-72076 Tubingen, Germany
关键词
STRUCTURAL VARIATION; SEQUENCE DATA; SHORT READS; IDENTIFICATION; EXPRESSION; POLYMORPHISMS; ALGORITHMS; ALIGNMENT;
D O I
10.1073/pnas.1107739108
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
引用
收藏
页码:10249 / 10254
页数:6
相关论文
共 42 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[3]   Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing [J].
Campbell, Peter J. ;
Stephens, Philip J. ;
Pleasance, Erin D. ;
O'Meara, Sarah ;
Li, Heng ;
Santarius, Thomas ;
Stebbings, Lucy A. ;
Leroy, Catherine ;
Edkins, Sarah ;
Hardy, Claire ;
Teague, Jon W. ;
Menzies, Andrew ;
Goodhead, Ian ;
Turner, Daniel J. ;
Clee, Christopher M. ;
Quail, Michael A. ;
Cox, Antony ;
Brown, Clive ;
Durbin, Richard ;
Hurles, Matthew E. ;
Edwards, Paul A. W. ;
Bignell, Graham R. ;
Stratton, Michael R. ;
Futreal, P. Andrew .
NATURE GENETICS, 2008, 40 (06) :722-729
[4]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[5]   High-resolution mapping of copy-number alterations with massively parallel sequencing [J].
Chiang, Derek Y. ;
Getz, Gad ;
Jaffe, David B. ;
O'Kelly, Michael J. T. ;
Zhao, Xiaojun ;
Carter, Scott L. ;
Russ, Carsten ;
Nusbaum, Chad ;
Meyerson, Matthew ;
Lander, Eric S. .
NATURE METHODS, 2009, 6 (01) :99-103
[6]   Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana [J].
Clark, Richard M. ;
Schweikert, Gabriele ;
Toomajian, Christopher ;
Ossowski, Stephan ;
Zeller, Georg ;
Shinn, Paul ;
Warthmann, Norman ;
Hu, Tina T. ;
Fu, Glenn ;
Hinds, David A. ;
Chen, Huaming ;
Frazer, Kelly A. ;
Huson, Daniel H. ;
Schoelkopf, Bernhard ;
Nordborg, Magnus ;
Raetsch, Gunnar ;
Ecker, Joseph R. ;
Weigel, Detlef .
SCIENCE, 2007, 317 (5836) :338-342
[7]   Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis [J].
Dalloul, Rami A. ;
Long, Julie A. ;
Zimin, Aleksey V. ;
Aslam, Luqman ;
Beal, Kathryn ;
Blomberg, Le Ann ;
Bouffard, Pascal ;
Burt, David W. ;
Crasta, Oswald ;
Crooijmans, Richard P. M. A. ;
Cooper, Kristal ;
Coulombe, Roger A. ;
De, Supriyo ;
Delany, Mary E. ;
Dodgson, Jerry B. ;
Dong, Jennifer J. ;
Evans, Clive ;
Frederickson, Karin M. ;
Flicek, Paul ;
Florea, Liliana ;
Folkerts, Otto ;
Groenen, Martien A. M. ;
Harkins, Tim T. ;
Herrero, Javier ;
Hoffmann, Steve ;
Megens, Hendrik-Jan ;
Jiang, Andrew ;
de Jong, Pieter ;
Kaiser, Pete ;
Kim, Heebal ;
Kim, Kyu-Won ;
Kim, Sungwon ;
Langenberger, David ;
Lee, Mi-Kyung ;
Lee, Taeheon ;
Mane, Shrinivasrao ;
Marcais, Guillaume ;
Marz, Manja ;
McElroy, Audrey P. ;
Modise, Thero ;
Nefedov, Mikhail ;
Notredame, Cedric ;
Paton, Ian R. ;
Payne, William S. ;
Pertea, Geo ;
Prickett, Dennis ;
Puiu, Daniela ;
Qioa, Dan ;
Raineri, Emanuele ;
Ruffier, Magali .
PLOS BIOLOGY, 2010, 8 (09)
[8]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[9]   A First-Generation Haplotype Map of Maize [J].
Gore, Michael A. ;
Chia, Jer-Ming ;
Elshire, Robert J. ;
Sun, Qi ;
Ersoz, Elhan S. ;
Hurwitz, Bonnie L. ;
Peiffer, Jason A. ;
McMullen, Michael D. ;
Grills, George S. ;
Ross-Ibarra, Jeffrey ;
Ware, Doreen H. ;
Buckler, Edward S. .
SCIENCE, 2009, 326 (5956) :1115-1117
[10]   Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery [J].
Hormozdiari, Fereydoun ;
Hajirasouliha, Iman ;
Dao, Phuong ;
Hach, Faraz ;
Yorukoglu, Deniz ;
Alkan, Can ;
Eichler, Evan E. ;
Sahinalp, S. Cenk .
BIOINFORMATICS, 2010, 26 (12) :i350-i357