HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

被引:59
作者
Berger, Emily [1 ,2 ,3 ]
Yorukoglu, Deniz [2 ]
Peng, Jian [1 ,2 ]
Berger, Bonnie [1 ,2 ]
机构
[1] MIT, Dept Math, Cambridge, MA 02139 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[3] Univ Calif Berkeley, Dept Math, Berkeley, CA USA
关键词
GENOME SEQUENCE DATA; HAPLOTYPE RECONSTRUCTION; ACCURATE ALGORITHM; INFERENCE; PHASE;
D O I
10.1371/journal.pcbi.1003502
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi) genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.
引用
收藏
页数:10
相关论文
共 19 条
[1]
Haplotype assembly in polyploid genomes and identical by descent shared tracts [J].
Aguiar, Derek ;
Istrail, Sorin .
BIOINFORMATICS, 2013, 29 (13) :352-360
[2]
HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data [J].
Aguiar, Derek ;
Istrail, Sorin .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :577-590
[3]
A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]
[Anonymous], [No title captured]
[5]
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[6]
An MCMC algorithm for haplotype assembly from whole-genome sequence data [J].
Bansal, Vikas ;
Halpern, Aaron L. ;
Axelrod, Nelson ;
Bafna, Vineet .
GENOME RESEARCH, 2008, 18 (08) :1336-1346
[7]
Berger E, 2014, LECT N BIOINFORMAT, V8394, P18, DOI 10.1007/978-3-319-05269-4_2
[8]
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :210-223
[9]
High-Resolution Detection of Identity by Descent in Unrelated Individuals [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 86 (04) :526-539
[10]
Shape-IT: new rapid and accurate algorithm for haplotype inference [J].
Delaneau, Olivier ;
Coulonges, Cedric ;
Zagury, Jean-Francois .
BMC BIOINFORMATICS, 2008, 9 (1)