Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments

被引:25
作者
Chen, Zhixiang [1 ]
Fu, Bin [1 ]
Schweller, Robert [1 ]
Yang, Boting [2 ]
Zhao, Zhiyu [3 ]
Zhu, Binhai [4 ]
机构
[1] Univ Texas Pan Amer, Dept Comp Sci, Edinburg, TX 78539 USA
[2] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada
[3] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA
[4] Montana State Univ, Dept Comp Sci, Bozeman, MT 59717 USA
关键词
inconsistency and incompleteness errors; linear time probabilistic algorithm; probabilistic modeling and analysis; singular haplotype reconstruction; SNP fragments;
D O I
10.1089/cmb.2008.0003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we develop a probabilistic model to approach two realistic scenarios regarding the singular haplotype reconstruction problem-the incompleteness and inconsistency that occurred in the DNA sequencing process to generate the input haplotype fragments, and the common practice used to generate synthetic data in experimental algorithm studies. We design three algorithms in the model that can reconstruct the two unknown haplotypes from the given matrix of haplotype fragments with provable high probability and in linear time in the size of the input matrix. We also present experimental results that conform with the theoretical efficient performance of those algorithms. The software of our algorithms is available for public access and for real-time on-line demonstration.
引用
收藏
页码:535 / 546
页数:12
相关论文
共 21 条
[1]  
[Anonymous], 2000, RANDOMIZED ALGORITHM
[2]   Polynomial and APX-hard cases of the individual haplotyping problem [J].
Bafna, V ;
Istrail, S ;
Lancia, G ;
Rizzi, R .
THEORETICAL COMPUTER SCIENCE, 2005, 335 (01) :109-125
[3]  
Cilibrasi R, 2005, LECT NOTES COMPUT SC, V3692, P128
[4]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[5]  
Genovese LM, 2007, LECT N BIOINFORMAT, V4645, P49
[6]  
GLANCIA RR, 2006, OPER RES LETT, V34, P289
[7]  
GUSFIELD D, 2002, 6 ANN INT C COMP BIO, P166
[8]  
GUSFIELD D, 2000, 8 INT C INT SYST MOL, P183
[9]   Sequence variability and candidate gene analysis in complex disease:: association of μ opioid receptor gene variation with substance dependence [J].
Hoehe, MR ;
Köpke, K ;
Wendel, B ;
Rohde, K ;
Flachmeier, C ;
Kidd, KK ;
Berrettini, WH ;
Church, GM .
HUMAN MOLECULAR GENETICS, 2000, 9 (19) :2895-2908
[10]   Haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms [J].
Lancia, G ;
Pinotti, MC ;
Rizzi, R .
INFORMS JOURNAL ON COMPUTING, 2004, 16 (04) :348-359