Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

被引:659
作者
Schneider, Valerie A. [1 ]
Graves-Lindsay, Tina [2 ]
Howe, Kerstin [3 ]
Bouk, Nathan [1 ]
Chen, Hsiu-Chuan [1 ]
Kitts, Paul A. [1 ]
Murphy, Terence D. [1 ]
Pruitt, Kim D. [1 ]
Thibaud-Nissen, Francoise [1 ]
Albracht, Derek [2 ]
Fulton, Robert S. [2 ]
Kremitzki, Milinn [2 ]
Magrini, Vincent [2 ,10 ]
Markovic, Chris [2 ]
McGrath, Sean [2 ]
Steinberg, Karyn Meltz [2 ]
Auger, Kate [3 ]
Chow, William [3 ]
Collins, Joanna [3 ]
Harden, Glenn [3 ]
Hubbard, Timothy [3 ,11 ]
Pelan, Sarah [3 ]
Simpson, Jared T. [3 ,12 ,13 ]
Threadgold, Glen [3 ]
Torrance, James [3 ]
Wood, Jonathan M. [3 ]
Clarke, Laura [4 ]
Koren, Sergey [5 ]
Boitano, Matthew [6 ]
Peluso, Paul [6 ]
Li, Heng [7 ]
Chin, Chen-Shan [6 ]
Phillippy, Adam M. [5 ]
Durbin, Richard
Wilson, Richard K. [2 ]
Flicek, Paul [4 ]
Eichler, Evan E. [8 ,9 ]
Church, Deanna M. [1 ,14 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Washington Univ, McDonnell Genome Inst, St Louis, MO 63018 USA
[3] Wellcome Trust Sanger Inst, Wellcome Genome Campus, Cambridge CB10 1SA, England
[4] European Bioinformat Inst, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[5] Natl Human Genome Res Inst, NIH, Bethesda, MD 20892 USA
[6] Pacific Biosci, Menlo Pk, CA 94025 USA
[7] Broad Inst, Cambridge, MA 02142 USA
[8] Univ Washington, Sch Med, DeptGenome Sci, Seattle, WA 98195 USA
[9] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[10] Nationwide Childrens Hosp, Columbus, OH 43205 USA
[11] Kings Coll London, London WC2R 2LS, England
[12] Ontario Inst Canc Res, Toronto, ON M5G 0A3, Canada
[13] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
[14] 10X Genom, Pleasanton, CA 94566 USA
基金
美国国家卫生研究院; 英国惠康基金;
关键词
COPY-NUMBER VARIATION; SEGMENTAL DUPLICATIONS; STRUCTURAL VARIATION; SEQUENCE; RESOURCE; CHROMOSOMES; ANNOTATION; DIVERSITY; FRAMEWORK; ADMIXTURE;
D O I
10.1101/gr.213611.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
引用
收藏
页码:849 / 864
页数:16
相关论文
共 73 条
[21]
Assemblathon 1: A competitive assessment of de novo short read assembly methods [J].
Earl, Dent ;
Bradnam, Keith ;
St John, John ;
Darling, Aaron ;
Lin, Dawei ;
Fass, Joseph ;
Hung On Ken Yu ;
Buffalo, Vince ;
Zerbino, Daniel R. ;
Diekhans, Mark ;
Ngan Nguyen ;
Ariyaratne, Pramila Nuwantha ;
Sung, Wing-Kin ;
Ning, Zemin ;
Haimel, Matthias ;
Simpson, Jared T. ;
Fonseca, Nuno A. ;
Birol, Inanc ;
Docking, T. Roderick ;
Ho, Isaac Y. ;
Rokhsar, Daniel S. ;
Chikhi, Rayan ;
Lavenier, Dominique ;
Chapuis, Guillaume ;
Naquin, Delphine ;
Maillet, Nicolas ;
Schatz, Michael C. ;
Kelley, David R. ;
Phillippy, Adam M. ;
Koren, Sergey ;
Yang, Shiaw-Pyng ;
Wu, Wei ;
Chou, Wen-Chi ;
Srivastava, Anuj ;
Shaw, Timothy I. ;
Ruby, J. Graham ;
Skewes-Cox, Peter ;
Betegon, Miguel ;
Dimon, Michelle T. ;
Solovyev, Victor ;
Seledtsov, Igor ;
Kosarev, Petr ;
Vorobyev, Denis ;
Ramirez-Gonzalez, Ricardo ;
Leggett, Richard ;
MacLean, Dan ;
Xia, Fangfang ;
Luo, Ruibang ;
Li, Zhenyu ;
Xie, Yinlong .
GENOME RESEARCH, 2011, 21 (12) :2224-2241
[22]
Assessing structural variation in a personal genome-towards a human reference diploid genome [J].
English, Adam C. ;
Salerno, William J. ;
Hampton, Oliver A. ;
Gonzaga-Jauregui, Claudia ;
Ambreth, Shruthi ;
Ritter, Deborah I. ;
Beck, Christine R. ;
Davis, Caleb F. ;
Dahdouli, Mahmoud ;
Ma, Singer ;
Carroll, Andrew ;
Veeraraghavan, Narayanan ;
Bruestle, Jeremy ;
Drees, Becky ;
Hastie, Alex ;
Lam, Ernest T. ;
White, Simon ;
Mishra, Pamela ;
Wang, Min ;
Han, Yi ;
Zhang, Feng ;
Stankiewicz, Pawel ;
Wheeler, David A. ;
Reid, Jeffrey G. ;
Muzny, Donna M. ;
Rogers, Jeffrey ;
Sabo, Aniko ;
Worley, Kim C. ;
Lupski, James R. ;
Boerwinkle, Eric ;
Gibbs, Richard A. .
BMC GENOMICS, 2015, 16
[23]
Falconer E, 2012, NAT METHODS, V9, P1107, DOI [10.1038/NMETH.2206, 10.1038/nmeth.2206]
[24]
Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping [J].
Fan, JB ;
Surti, U ;
Taillon-Miller, P ;
Hsie, L ;
Kennedy, GC ;
Hoffner, L ;
Ryder, T ;
Mutch, DG ;
Kwok, PY .
GENOMICS, 2002, 79 (01) :58-62
[25]
Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies [J].
Florea, Liliana ;
Souvorov, Alexander ;
Kalbfleisch, Theodore S. ;
Salzberg, Steven L. .
PLOS ONE, 2011, 6 (06)
[26]
Efficient storage of high throughput DNA sequencing data using reference-based compression [J].
Fritz, Markus Hsi-Yang ;
Leinonen, Rasko ;
Cochrane, Guy ;
Birney, Ewan .
GENOME RESEARCH, 2011, 21 (05) :734-740
[27]
Mapping the Human Reference Genome's Missing Sequence by Three-Way Admixture in Latino Genomes [J].
Genovese, Giulio ;
Handsaker, Robert E. ;
Li, Heng ;
Kenny, Eimear E. ;
McCarroll, Steven A. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (03) :411-421
[28]
Using population admixture to help complete maps of the human genome [J].
Genovese, Giulio ;
Handsaker, Robert E. ;
Li, Heng ;
Altemose, Nicolas ;
Lindgren, Amelia M. ;
Chambert, Kimberly ;
Pasaniuc, Bogdan ;
Price, Alkes L. ;
Reich, David ;
Morton, Cynthia C. ;
Pollak, Martin R. ;
Wilson, James G. ;
McCarroll, Steven A. .
NATURE GENETICS, 2013, 45 (04) :406-414
[29]
A Draft Sequence of the Neandertal Genome [J].
Green, Richard E. ;
Krause, Johannes ;
Briggs, Adrian W. ;
Maricic, Tomislav ;
Stenzel, Udo ;
Kircher, Martin ;
Patterson, Nick ;
Li, Heng ;
Zhai, Weiwei ;
Fritz, Markus Hsi-Yang ;
Hansen, Nancy F. ;
Durand, Eric Y. ;
Malaspinas, Anna-Sapfo ;
Jensen, Jeffrey D. ;
Marques-Bonet, Tomas ;
Alkan, Can ;
Pruefer, Kay ;
Meyer, Matthias ;
Burbano, Hernan A. ;
Good, Jeffrey M. ;
Schultz, Rigo ;
Aximu-Petri, Ayinuer ;
Butthof, Anne ;
Hoeber, Barbara ;
Hoeffner, Barbara ;
Siegemund, Madlen ;
Weihmann, Antje ;
Nusbaum, Chad ;
Lander, Eric S. ;
Russ, Carsten ;
Novod, Nathaniel ;
Affourtit, Jason ;
Egholm, Michael ;
Verna, Christine ;
Rudan, Pavao ;
Brajkovic, Dejana ;
Kucan, Zeljko ;
Gusic, Ivan ;
Doronichev, Vladimir B. ;
Golovanova, Liubov V. ;
Lalueza-Fox, Carles ;
de la Rasilla, Marco ;
Fortea, Javier ;
Rosas, Antonio ;
Schmitz, Ralf W. ;
Johnson, Philip L. F. ;
Eichler, Evan E. ;
Falush, Daniel ;
Birney, Ewan ;
Mullikin, James C. .
SCIENCE, 2010, 328 (5979) :710-722
[30]
GENCODE: The reference human genome annotation for The ENCODE Project [J].
Harrow, Jennifer ;
Frankish, Adam ;
Gonzalez, Jose M. ;
Tapanari, Electra ;
Diekhans, Mark ;
Kokocinski, Felix ;
Aken, Bronwen L. ;
Barrell, Daniel ;
Zadissa, Amonida ;
Searle, Stephen ;
Barnes, If ;
Bignell, Alexandra ;
Boychenko, Veronika ;
Hunt, Toby ;
Kay, Mike ;
Mukherjee, Gaurab ;
Rajan, Jeena ;
Despacio-Reyes, Gloria ;
Saunders, Gary ;
Steward, Charles ;
Harte, Rachel ;
Lin, Michael ;
Howald, Cedric ;
Tanzer, Andrea ;
Derrien, Thomas ;
Chrast, Jacqueline ;
Walters, Nathalie ;
Balasubramanian, Suganthi ;
Pei, Baikang ;
Tress, Michael ;
Manuel Rodriguez, Jose ;
Ezkurdia, Iakes ;
van Baren, Jeltje ;
Brent, Michael ;
Haussler, David ;
Kellis, Manolis ;
Valencia, Alfonso ;
Reymond, Alexandre ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. .
GENOME RESEARCH, 2012, 22 (09) :1760-1774