Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

被引:1088
作者
Burton, Joshua N. [1 ]
Adey, Andrew [1 ]
Patwardhan, Rupali P. [1 ]
Qiu, Ruolan [1 ]
Kitzman, Jacob O. [1 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
MAMMALIAN GENOMES; PRINCIPLES; ALIGNMENT; MODEL; MAPS;
D O I
10.1038/nbt.2727
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving-for the human genome-98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.
引用
收藏
页码:1119 / +
页数:9
相关论文
共 33 条
[11]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[12]   Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species [J].
Haussler, David ;
O'Brien, Stephen J. ;
Ryder, Oliver A. ;
Barker, F. Keith ;
Clamp, Michele ;
Crawford, Andrew J. ;
Hanner, Robert ;
Hanotte, Olivier ;
Johnson, Warren E. ;
McGuire, Jimmy A. ;
Miller, Webb ;
Murphy, Robert W. ;
Murphy, William J. ;
Sheldon, Frederick H. ;
Sinervo, Barry ;
Venkatesh, Byrappa ;
Wiley, Edward O. ;
Allendorf, Fred W. ;
Amato, George ;
Baker, C. Scott ;
Bauer, Aaron ;
Beja-Pereira, Albano ;
Bermingham, Eldredge ;
Bernardi, Giacomo ;
Bonvicino, Cibele R. ;
Brenner, Sydney ;
Burke, Terry ;
Cracraft, Joel ;
Diekhans, Mark ;
Edwards, Scott ;
Ericson, Per G. P. ;
Estes, James ;
Fjelsda, Jon ;
Flesness, Nate ;
Gamble, Tony ;
Gaubert, Philippe ;
Graphodatsky, Alexander S. ;
Graves, Jennifer A. Marshall ;
Green, Eric D. ;
Green, Richard E. ;
Hackett, Shannon ;
Hebert, Paul ;
Helgen, Kristofer M. ;
Joseph, Leo ;
Kessing, Bailey ;
Kingsley, David M. ;
Lewin, Harris A. ;
Luikart, Gordon ;
Martelli, Paolo ;
Moreira, Miguel A. M. .
JOURNAL OF HEREDITY, 2009, 100 (06) :659-674
[13]  
Jung J, 2003, J GLOBAL OPTIM, V25, P91
[14]   Haplotype-resolved genome sequencing of a Gujarati Indian individual [J].
Kitzman, Jacob O. ;
MacKenzie, Alexandra P. ;
Adey, Andrew ;
Hiatt, Joseph B. ;
Patwardhan, Rupali P. ;
Sudmant, Peter H. ;
Ng, Sarah B. ;
Alkan, Can ;
Qiu, Ruolan ;
Eichler, Evan E. ;
Shendure, Jay .
NATURE BIOTECHNOLOGY, 2011, 29 (01) :59-+
[15]   Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly [J].
Lam, Ernest T. ;
Hastie, Alex ;
Lin, Chin ;
Ehrlich, Dean ;
Das, Somes K. ;
Austin, Michael D. ;
Deshpande, Paru ;
Cao, Han ;
Nagarajan, Niranjan ;
Xiao, Ming ;
Kwok, Pui-Yan .
NATURE BIOTECHNOLOGY, 2012, 30 (08) :771-776
[16]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[17]   The Genomic and Transcriptomic Landscape of a HeLa Cell Line [J].
Landry, Jonathan J. M. ;
Pyl, Paul Theodor ;
Rausch, Tobias ;
Zichner, Thomas ;
Tekkedil, Manu M. ;
Stuetz, Adrian M. ;
Jauch, Anna ;
Aiyar, Raeka S. ;
Pau, Gregoire ;
Delhomme, Nicolas ;
Gagneur, Julien ;
Korbel, Jan O. ;
Huber, Wolfgang ;
Steinmetz, Lars M. .
G3-GENES GENOMES GENETICS, 2013, 3 (08) :1213-1224
[18]   Fast and accurate long-read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2010, 26 (05) :589-595
[19]   De novo assembly of human genomes with massively parallel short read sequencing [J].
Li, Ruiqiang ;
Zhu, Hongmei ;
Ruan, Jue ;
Qian, Wubin ;
Fang, Xiaodong ;
Shi, Zhongbin ;
Li, Yingrui ;
Li, Shengting ;
Shan, Gao ;
Kristiansen, Karsten ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian ;
Wang, Jun .
GENOME RESEARCH, 2010, 20 (02) :265-272
[20]   Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome [J].
Lieberman-Aiden, Erez ;
van Berkum, Nynke L. ;
Williams, Louise ;
Imakaev, Maxim ;
Ragoczy, Tobias ;
Telling, Agnes ;
Amit, Ido ;
Lajoie, Bryan R. ;
Sabo, Peter J. ;
Dorschner, Michael O. ;
Sandstrom, Richard ;
Bernstein, Bradley ;
Bender, M. A. ;
Groudine, Mark ;
Gnirke, Andreas ;
Stamatoyannopoulos, John ;
Mirny, Leonid A. ;
Lander, Eric S. ;
Dekker, Job .
SCIENCE, 2009, 326 (5950) :289-293