Haplotype assembly from aligned weighted SNP fragments

被引:47
作者
Zhao, YY [1 ]
Wu, LY [1 ]
Zhang, JH [1 ]
Wang, RS [1 ]
Zhang, XS [1 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Inst Appl Math, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
SNP; haplotype; assembly; minimum letter flips; dynamic clustering;
D O I
10.1016/j.compbiolchem.2005.05.001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Given an assembled genome of a diploid organism the haplotype assembly problem can be formulated as retrieval of a pair of haplotypes from a set of aligned weighted SNP fragments. Known computational formulations (models) of this problem are minimum letter flips (MLF) and the weighted minimum letter flips (WMLF; Greenberg et al. (INFORMS J. Comput. 2004,14, 211-213)). In this paper we show that the general WMLF model is NP-hard even for the gapless case. However the algorithmic solutions for selected variants of WMFL can exist and we propose a heuristic algorithm based on a dynamic clustering technique. We also introduce a new formulation of the haplotype assembly problem that we call COMPLETE WMLF (CWMLF). This model and algorithms for its implementation take into account a simultaneous presence of multiple kinds of data errors. Extensive computational experiments indicate that the algorithmic implementations of the CWMLF model achieve higher accuracy of haplotype reconstruction than the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:281 / 287
页数:7
相关论文
共 13 条
[1]   It's raining SNPs, hallelujah? [J].
Chakravarti, A .
NATURE GENETICS, 1998, 19 (03) :216-217
[2]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[3]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[4]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[5]  
Garey M. R., 1979, Computers and intractability. A guide to the theory of NP-completeness
[6]   Opportunities for combinatorial optimization in computational biology [J].
Greenberg, HJ ;
Hart, WE ;
Lancia, G .
INFORMS JOURNAL ON COMPUTING, 2004, 16 (03) :211-231
[7]   Inference of haplotypes from samples of diploid populations: Complexity and algorithms [J].
Gusfield, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (03) :305-323
[8]  
Helmuth L, 2001, SCIENCE, V293, P583
[9]  
Lippert Ross, 2002, Brief Bioinform, V3, P23, DOI 10.1093/bib/3.1.23
[10]  
MacQueen J., 1967, P 5 BERK S MATH STAT, V14, P281, DOI DOI 10.1234/12345678