Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

被引:66
作者
Cartney, Ann M. Mc [1 ]
Shafin, Kishwar [2 ]
Alonge, Michael [3 ]
Bzikadze, Andrey, V [4 ]
Formenti, Giulio [5 ,6 ]
Fungtammasan, Arkarachai [7 ]
Howe, Kerstin [8 ]
Jain, Chirag [1 ,9 ]
Koren, Sergey [1 ]
Logsdon, Glennis A. [10 ]
Miga, Karen H. [2 ,11 ]
Mikheenko, Alla [12 ]
Paten, Benedict [2 ]
Shumate, Alaina [13 ]
Soto, Daniela C. [14 ]
Sovic, Ivan [15 ,16 ]
Wood, Jonathan Md [8 ]
Zook, Justin M. [17 ]
Phillippy, Adam M. [1 ]
Rhie, Arang [1 ]
机构
[1] NHGRI, Genome Informat Sect, Computat & Stat Genom Branch, NIH, Bethesda, MD 20892 USA
[2] Univ Calif Santa Cruz, UC Santa Cruz Genom Inst, Santa Cruz, CA 95064 USA
[3] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[4] Univ Calif San Diego, Grad Program Bioinformat & Syst Biol, La Jolla, CA 92093 USA
[5] Rockefeller Univ, Lab Neurogenet Language, 1230 York Ave, New York, NY 10021 USA
[6] Rockefeller Univ, Vertebrate Genome Lab, 1230 York Ave, New York, NY 10021 USA
[7] DNAnexus, Mountain View, CA USA
[8] Wellcome Sanger Inst, Cambridge, England
[9] Indian Inst Sci, Dept Computat & Data Sci, Bangalore, Karnataka, India
[10] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA USA
[11] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[12] St Petersburg State Univ, Ctr Algorithm Biotechnol, Inst Translat Biomed, St Petersburg, Russia
[13] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[14] Univ Calif Davis, Genome Ctr, MIND Inst, Dept Biochem & Mol Med, Davis, CA 95616 USA
[15] Pacific Biosci, Menlo Pk, CA USA
[16] Digital BioL Doo, Ivanic Grad, Croatia
[17] NIST, Biosyst & Biomat Div, Gaithersburg, MD 20899 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1038/s41592-022-01440-3
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
The work describes the validation and polishing strategies developed by the telomere-to-telomere consortium for evaluating and improving the first complete human genome assembly. Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
引用
收藏
页码:687 / +
页数:23
相关论文
共 62 条
[1]
A complete reference genome improves analysis of human genetic variation [J].
Aganezov, Sergey ;
Yan, Stephanie M. ;
Soto, Daniela C. ;
Kirsche, Melanie ;
Zarate, Samantha ;
Avdeyev, Pavel ;
Taylor, Dylan J. ;
Shafin, Kishwar ;
Shumate, Alaina ;
Xiao, Chunlin ;
Wagner, Justin ;
McDaniel, Jennifer ;
Olson, Nathan D. ;
Sauria, Michael E. G. ;
Vollger, Mitchell R. ;
Rhie, Arang ;
Meredith, Melissa ;
Martin, Skylar ;
Lee, Joyce ;
Koren, Sergey ;
Rosenfeld, Jeffrey A. ;
Paten, Benedict ;
Layer, Ryan ;
Chin, Chen-Shan ;
Sedlazeck, Fritz J. ;
Hansen, Nancy F. ;
Miller, Danny E. ;
Phillippy, Adam M. ;
Miga, Karen H. ;
McCoy, Rajiv C. ;
Dennis, Megan Y. ;
Zook, Justin M. ;
Schatz, Michael C. .
SCIENCE, 2022, 376 (6588) :54-+
[2]
Complete genomic and epigenetic maps of human centromeres [J].
Altemose, Nicolas ;
Logsdon, Glennis A. ;
Bzikadze, Andrey, V ;
Sidhwani, Pragya ;
Langley, Sasha A. ;
Caldas, Gina, V ;
Hoyt, Savannah J. ;
Uralsky, Lev ;
Ryabov, Fedor D. ;
Shew, Colin J. ;
Sauria, Michael E. G. ;
Borchers, Matthew ;
Gershman, Ariel ;
Mikheenko, Alla ;
Shepelev, Valery A. ;
Dvorkina, Tatiana ;
Kunyavskaya, Olga ;
Vollger, Mitchell R. ;
Rhie, Arang ;
McCartney, Ann M. ;
Asri, Mobin ;
Lorig-Roach, Ryan ;
Shafin, Kishwar ;
Lucas, Julian K. ;
Aganezov, Sergey ;
Olson, Daniel ;
de Lima, Leonardo Gomes ;
Potapova, Tamara ;
Hartley, Gabrielle A. ;
Haukness, Marina ;
Kerpedjiev, Peter ;
Gusev, Fedor ;
Tigyi, Kristof ;
Brooks, Shelise ;
Young, Alice ;
Nurk, Sergey ;
Koren, Sergey ;
Salama, Sofie R. ;
Paten, Benedict ;
Rogaev, Evgeny, I ;
Streets, Aaron ;
Karpen, Gary H. ;
Dernburg, Abby F. ;
Sullivan, Beth A. ;
Straight, Aaron F. ;
Wheeler, Travis J. ;
Gerton, Jennifer L. ;
Eichler, Evan E. ;
Phillippy, Adam M. ;
Timp, Winston .
SCIENCE, 2022, 376 (6588) :56-+
[3]
FORMATION OF DNA TRIPLEXES ACCOUNTS FOR ARRESTS OF DNA-SYNTHESIS AT D(TC)N AND D(GA)N TRACTS [J].
BARAN, N ;
LAPIDOT, A ;
MANOR, H .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (02) :507-511
[4]
Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly [J].
Chen, Yen-Chun ;
Liu, Tsunglin ;
Yu, Chun-Hui ;
Chiang, Tzen-Yuh ;
Hwang, Chi-Chuan .
PLOS ONE, 2013, 8 (04)
[5]
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm [J].
Cheng, Haoyu ;
Concepcion, Gregory T. ;
Feng, Xiaowen ;
Zhang, Haowen ;
Li, Heng .
NATURE METHODS, 2021, 18 (02) :170-+
[6]
Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[7]
Sequencing and de novo assembly of a near complete indica rice genome [J].
Du, Huilong ;
Yu, Ying ;
Ma, Yanfei ;
Gao, Qiang ;
Cao, Yinghao ;
Chen, Zhuo ;
Ma, Bin ;
Qi, Ming ;
Li, Yan ;
Zhao, Xianfeng ;
Wang, Jing ;
Liu, Kunfan ;
Qin, Peng ;
Yang, Xin ;
Zhu, Lihuang ;
Li, Shigui ;
Liang, Chengzhi .
NATURE COMMUNICATIONS, 2017, 8
[8]
Haplotype-resolved diverse human genomes and integrated analysis of structural variation [J].
Ebert, Peter ;
Audano, Peter A. ;
Zhu, Qihui ;
Rodriguez-Martin, Bernardo ;
Porubsky, David ;
Bonder, Marc Jan ;
Sulovari, Arvis ;
Ebler, Jana ;
Zhou, Weichen ;
Mari, Rebecca Serra ;
Yilmaz, Feyza ;
Zhao, Xuefang ;
Hsieh, PingHsun ;
Lee, Joyce ;
Kumar, Sushant ;
Lin, Jiadong ;
Rausch, Tobias ;
Chen, Yu ;
Ren, Jingwen ;
Santamarina, Martin ;
Hops, Wolfram ;
Ashraf, Hufsah ;
Chuang, Nelson T. ;
Yang, Xiaofei ;
Munson, Katherine M. ;
Lewis, Alexandra P. ;
Fairley, Susan ;
Tallon, Luke J. ;
Clarke, Wayne E. ;
Basile, Anna O. ;
Byrska-Bishop, Marta ;
Corvelo, Andre ;
Evani, Uday S. ;
Lu, Tsung-Yu ;
Chaisson, Mark J. P. ;
Chen, Junjie ;
Li, Chong ;
Brand, Harrison ;
Wenger, Aaron M. ;
Ghareghani, Maryam ;
Harvey, William T. ;
Raeder, Benjamin ;
Hasenfeld, Patrick ;
Regier, Allison A. ;
Abel, Haley J. ;
Hall, Ira M. ;
Flicek, Paul ;
Stegle, Oliver ;
Gerstein, Mark B. ;
Tubio, Jose M. C. .
SCIENCE, 2021, 372 (6537) :48-+
[9]
How independent are the appearances of n-mers in different genomes? [J].
Fofanov, Y ;
Luo, Y ;
Katili, C ;
Wang, J ;
Belosludtsev, Y ;
Powdrill, T ;
Belapurkar, C ;
Fofanov, V ;
Li, TB ;
Chumakov, S ;
Pettitt, BM .
BIOINFORMATICS, 2004, 20 (15) :2421-2428
[10]
Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation [J].
Formenti, Giulio ;
Rhie, Arang ;
Walenz, Brian P. ;
Thibaud-Nissen, Francoise ;
Shafin, Kishwar ;
Koren, Sergey ;
Myers, Eugene W. ;
Jarvis, Erich D. ;
Phillippy, Adam M. .
NATURE METHODS, 2022, 19 (06) :696-+