HapCUT: an efficient and accurate algorithm for the haplotype assembly problem

被引:197
作者
Bansal, Vikas [1 ]
Bafna, Vineet [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btn298
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUT to infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (2025 lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project.
引用
收藏
页码:I153 / I159
页数:7
相关论文
共 19 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   Polynomial and APX-hard cases of the individual haplotyping problem [J].
Bafna, V ;
Istrail, S ;
Lancia, G ;
Rizzi, R .
THEORETICAL COMPUTER SCIENCE, 2005, 335 (01) :109-125
[3]   Haplotyping as perfect phylogeny: A direct approach [J].
Bafna, V ;
Gusfield, D ;
Lancia, G ;
Yooseph, S .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :323-340
[4]  
BANSAL V, 2008, GENOME RES
[5]  
Cilibrasi R, 2005, LECT NOTES COMPUT SC, V3692, P128
[6]  
Eskin Eleazar, 2003, J Bioinform Comput Biol, V1, P1, DOI 10.1142/S0219720003000174
[7]  
Garey MR, 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[8]  
Gusfield D., 2002, PROC 6 ANN INT C COM, P166, DOI DOI 10.1145/565196.565218.
[9]   Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi [J].
Kim, Jong Hyun ;
Waterman, Michael S. ;
Li, Lei M. .
GENOME RESEARCH, 2007, 17 (07) :1101-1110
[10]  
LANCIA G, 2001, P 9 ANN EUR S ALG ES, P182