Single haplotype assembly of the human genome from a hydatidiform mole

被引:82
作者
Steinberg, Karyn Meltz [1 ]
Schneider, Valerie A. [2 ]
Graves-Lindsay, Tina A. [1 ]
Fulton, Robert S. [1 ]
Agarwala, Richa [2 ]
Huddleston, John [3 ,4 ]
Shiryev, Sergey A. [2 ]
Morgulis, Aleksandr [2 ]
Surti, Urvashi [5 ]
Warren, Wesley C. [1 ]
Church, Deanna M. [6 ]
Eichler, Evan E. [3 ,4 ]
Wilson, Richard K. [1 ]
机构
[1] Washington Univ, Genome Inst, St Louis, MO 63108 USA
[2] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98795 USA
[5] Univ Pittsburgh, Dept Pathol & Human Genet, Pittsburgh, PA 15260 USA
[6] Personalis Inc, Menlo Pk, CA 94025 USA
关键词
COPY-NUMBER VARIATION; SEGMENTAL DUPLICATIONS; STRUCTURAL VARIATION; BREAKPOINT REGION; SEQUENCE; MAP; DIVERSITY; EVOLUTION; GENES; ISOCHROMOSOME;
D O I
10.1101/gr.180893.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100x Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM_11.1(NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.
引用
收藏
页码:2066 / 2076
页数:11
相关论文
共 48 条
[1]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   Segmental duplications: Organization and impact within the current Human Genome Project assembly [J].
Bailey, JA ;
Yavor, AM ;
Massa, HF ;
Trask, BJ ;
Eichler, EE .
GENOME RESEARCH, 2001, 11 (06) :1005-1017
[4]   The breakpoint region of the most common isochromosome, i(17q), in human neoplasia is characterized by a complex genomic architecture with large, palindromic, low-copy repeats [J].
Barbouti, A ;
Stankiewicz, P ;
Nusbaum, C ;
Cuomo, C ;
Cook, A ;
Höglund, M ;
Johansson, B ;
Hagemeijer, A ;
Park, SS ;
Mitelman, F ;
Lupski, JR ;
Fioretos, T .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) :1-10
[5]   An isogenetic myoblast expression screen identifies DUX4-mediated FSHD-associated molecular pathologies [J].
Bosnakovski, Darko ;
Xu, Zhaohui ;
Gang, Eun Ji ;
Galindo, Cristi L. ;
Liu, Mingju ;
Simsek, Tugba ;
Garner, Harold R. ;
Agha-Mohammadi, Siamak ;
Tassin, Alexandra ;
Coppee, Frederique ;
Belayew, Alexandra ;
Perlingeiro, Rita R. ;
Kyba, Michael .
EMBO JOURNAL, 2008, 27 (20) :2766-2779
[6]   Copy number variation at the breakpoint region of isochromosome 17q [J].
Carvalho, Claudia M. B. ;
Lupski, James R. .
GENOME RESEARCH, 2008, 18 (11) :1724-1732
[7]  
Chen R, 2011, BIOCOMPUT-PAC SYM, P231
[8]   Modernizing Reference Genome Assemblies [J].
Church, Deanna M. ;
Schneider, Valerie A. ;
Graves, Tina ;
Auger, Katherine ;
Cunningham, Fiona ;
Bouk, Nathan ;
Chen, Hsiu-Chuan ;
Agarwala, Richa ;
McLaren, William M. ;
Ritchie, Graham R. S. ;
Albracht, Derek ;
Kremitzki, Milinn ;
Rock, Susan ;
Kotkiewicz, Holland ;
Kremitzki, Colin ;
Wollam, Aye ;
Trani, Lee ;
Fulton, Lucinda ;
Fulton, Robert ;
Matthews, Lucy ;
Whitehead, Siobhan ;
Chow, Will ;
Torrance, James ;
Dunn, Matthew ;
Harden, Glenn ;
Threadgold, Glen ;
Wood, Jonathan ;
Collins, Joanna ;
Heath, Paul ;
Griffiths, Guy ;
Pelan, Sarah ;
Grafham, Darren ;
Eichler, Evan E. ;
Weinstock, George ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Howe, Kerstin ;
Flicek, Paul ;
Hubbard, Tim .
PLOS BIOLOGY, 2011, 9 (07)
[9]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945
[10]   Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication [J].
Dennis, Megan Y. ;
Nuttle, Xander ;
Sudmant, Peter H. ;
Antonacci, Francesca ;
Graves, Tina A. ;
Nefedov, Mikhail ;
Rosenfeld, Jill A. ;
Sajjadian, Saba ;
Malig, Maika ;
Kotkiewicz, Holland ;
Curry, Cynthia J. ;
Shafer, Susan ;
Shaffer, Lisa G. ;
de Jong, Pieter J. ;
Wilson, Richard K. ;
Eichler, Evan E. .
CELL, 2012, 149 (04) :912-922