Extending partial haplotypes to full genome haplotypes using chromosome conformation capture data

被引:14
作者
Ben-Elazar, Shay [1 ,2 ]
Chor, Benny [1 ]
Yakhini, Zohar [3 ,4 ,5 ]
机构
[1] Tel Aviv Univ, Dept Comp Sci, IL-69978 Tel Aviv, Israel
[2] Microsoft R&D, Herzlyia, Israel
[3] Agilent Labs, Tel Aviv, Israel
[4] Technion Israel Inst Technol, Dept Comp Sci, Haifa, Israel
[5] Herzeliya Interdisciplinary Ctr, Sch Comp Sci, Herzliyya, Israel
关键词
D O I
10.1093/bioinformatics/btw453
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Complex interactions among alleles often drive differences in inherited properties including disease predisposition. Isolating the effects of these interactions requires phasing information that is difficult to measure or infer. Furthermore, prevalent sequencing technologies used in the essential first step of determining a haplotype limit the range of that step to the span of reads, namely hundreds of bases. With the advent of pseudo-long read technologies, observable partial haplotypes can span several orders of magnitude more. Yet, measuring whole-genome-single-individual haplotypes remains a challenge. A different view ofwhole genome measurement addresses the 3D structure of the genome-with great development of Hi-C techniques in recent years. A shortcoming of current Hi-C, however, is the difficulty in inferring information that is specific to each of a pair of homologous chromosomes. Results: In this work, we develop a robust algorithmic framework that takes two measurement derived datasets: raw Hi-C and partial short-range haplotypes, and constructs the full-genome haplotype as well as phased diploid Hi-C maps. By analyzing both data sets together we thus bridge important gaps in both technologies-from short to long haplotypes and from un-phased to phased Hi-C. We demonstrate that our method can recover ground truth haplotypes with high accuracy, using measured biological data as well as simulated data. We analyze the impact of noise, Hi-C sequencing depth and measured haplotype lengths on performance. Finally, we use the inferred 3D structure of a human genome to point at transcription factor targets nuclear co-localization.
引用
收藏
页码:559 / 566
页数:8
相关论文
共 21 条
[1]
A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]
Analysis methods for studying the 3D architecture of the genome [J].
Ay, Ferhat ;
Noble, William S. .
GENOME BIOLOGY, 2015, 16
[3]
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[4]
Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome [J].
Ben-Elazar, Shay ;
Yakhini, Zohar ;
Yanai, Itai .
NUCLEIC ACIDS RESEARCH, 2013, 41 (04) :2191-2201
[5]
HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions [J].
Bovolenta, Luiz A. ;
Acencio, Marcio L. ;
Lemke, Ney .
BMC GENOMICS, 2012, 13
[6]
Three-dimensional eukaryotic genomic organization is strongly correlated with codon usage expression and function [J].
Diament, Alon ;
Pinter, Ron Y. ;
Tuller, Tamir .
NATURE COMMUNICATIONS, 2014, 5
[7]
Dulmage A.L., 1958, Can J Math, V10, P517, DOI DOI 10.4153/CJM-1958-052-0
[8]
Discovering motifs in ranked lists of DNA sequences [J].
Eden, Eran ;
Lipson, Doron ;
Yogev, Sivan ;
Yakhini, Zohar .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (03) :508-522
[9]
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists [J].
Eden, Eran ;
Navon, Roy ;
Steinfeld, Israel ;
Lipson, Doron ;
Yakhini, Zohar .
BMC BIOINFORMATICS, 2009, 10
[10]
Startups use short-read data to expand long-read sequencing market [J].
Eisenstein, Michael .
NATURE BIOTECHNOLOGY, 2015, 33 (05) :433-435