Reconstructing complex regions of genomes using long-read sequencing technology

被引:182
作者
Huddleston, John [1 ,2 ]
Ranade, Swati [3 ]
Malig, Maika [1 ]
Antonacci, Francesca [4 ]
Chaisson, Mark [1 ]
Hon, Lawrence
Sudmant, Peter H. [1 ]
Graves, Tina A. [5 ]
Alkan, Can [6 ]
Dennis, Megan Y. [1 ]
Wilson, Richard K. [5 ]
Turner, Stephen W. [3 ]
Korlach, Jonas [3 ]
Eichler, Evan E. [1 ,2 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
[4] Univ Bari, Dept Biol, I-70126 Bari, Italy
[5] Washington Univ, Genome Inst, St Louis, MO 63110 USA
[6] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; ASSEMBLIES; INVERSION; DIVERSITY;
D O I
10.1101/gr.168450.113
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.
引用
收藏
页码:688 / 696
页数:9
相关论文
共 22 条
[1]   Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition [J].
Adey, Andrew ;
Morrison, Hilary G. ;
Asan ;
Xun, Xu ;
Kitzman, Jacob O. ;
Turner, Emily H. ;
Stackhouse, Bethany ;
MacKenzie, Alexandra P. ;
Caruccio, Nicholas C. ;
Zhang, Xiuqing ;
Shendure, Jay .
GENOME BIOLOGY, 2010, 11 (12)
[2]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[3]   Genome-wide characterization of centromeric satellites from multiple mammalian genomes [J].
Alkan, Can ;
Cardone, Maria Francesca ;
Catacchio, Claudia Rita ;
Antonacci, Francesca ;
O'Brien, Stephen J. ;
Ryder, Oliver A. ;
Purgato, Stefania ;
Zoli, Monica ;
Della Valle, Giuliano ;
Eichler, Evan E. ;
Ventura, Mario .
GENOME RESEARCH, 2011, 21 (01) :137-145
[4]   Improving PacBio Long Read Accuracy by Short Read Alignment [J].
Au, Kin Fai ;
Underwood, Jason G. ;
Lee, Lawrence ;
Wong, Wing Hung .
PLOS ONE, 2012, 7 (10)
[5]   Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions [J].
Burton, Joshua N. ;
Adey, Andrew ;
Patwardhan, Rupali P. ;
Qiu, Ruolan ;
Kitzman, Jacob O. ;
Shendure, Jay .
NATURE BIOTECHNOLOGY, 2013, 31 (12) :1119-+
[6]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[7]   Inhibition of SRGAP2 Function by Its Human-Specific Paralogs Induces Neoteny during Spine Maturation [J].
Charrier, Cecile ;
Joshi, Kaumudi ;
Coutinho-Budd, Jaeda ;
Kim, Ji-Eun ;
Lambert, Nelle ;
de Marchena, Jacqueline ;
Jin, Wei-Lin ;
Vanderhaeghen, Pierre ;
Ghosh, Anirvan ;
Sassa, Takayuki ;
Polleux, Franck .
CELL, 2012, 149 (04) :923-935
[8]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[9]   Modernizing Reference Genome Assemblies [J].
Church, Deanna M. ;
Schneider, Valerie A. ;
Graves, Tina ;
Auger, Katherine ;
Cunningham, Fiona ;
Bouk, Nathan ;
Chen, Hsiu-Chuan ;
Agarwala, Richa ;
McLaren, William M. ;
Ritchie, Graham R. S. ;
Albracht, Derek ;
Kremitzki, Milinn ;
Rock, Susan ;
Kotkiewicz, Holland ;
Kremitzki, Colin ;
Wollam, Aye ;
Trani, Lee ;
Fulton, Lucinda ;
Fulton, Robert ;
Matthews, Lucy ;
Whitehead, Siobhan ;
Chow, Will ;
Torrance, James ;
Dunn, Matthew ;
Harden, Glenn ;
Threadgold, Glen ;
Wood, Jonathan ;
Collins, Joanna ;
Heath, Paul ;
Griffiths, Guy ;
Pelan, Sarah ;
Grafham, Darren ;
Eichler, Evan E. ;
Weinstock, George ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Howe, Kerstin ;
Flicek, Paul ;
Hubbard, Tim .
PLOS BIOLOGY, 2011, 9 (07)
[10]   Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse [J].
Church, Deanna M. ;
Goodstadt, Leo ;
Hillier, LaDeana W. ;
Zody, Michael C. ;
Goldstein, Steve ;
She, Xinwe ;
Bult, Carol J. ;
Agarwala, Richa ;
Cherry, Joshua L. ;
DiCuccio, Michael ;
Hlavina, Wratko ;
Kapustin, Yuri ;
Meric, Peter ;
Maglott, Donna ;
Birtle, Zoe ;
Marques, Ana C. ;
Graves, Tina ;
Zhou, Shiguo ;
Teague, Brian ;
Potamousis, Konstantinos ;
Churas, Christopher ;
Place, Michael ;
Herschleb, Jill ;
Runnheim, Ron ;
Forrest, Daniel ;
Amos-Landgraf, James ;
Schwartz, David C. ;
Cheng, Ze ;
Lindblad-Toh, Kerstin ;
Eichler, Evan E. ;
Ponting, Chris P. .
PLOS BIOLOGY, 2009, 7 (05)