Inference of haplotypes from samples of diploid populations: Complexity and algorithms

被引:100
作者
Gusfield, D [1 ]
机构
[1] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
关键词
haplotype inference; integer programming; genetic screens; SNPs; algorithms;
D O I
10.1089/10665270152530863
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The next phase of human genomics will involve large-scale screens of populations for significant DNA polymorphisms, notably single nucleotide polymorphisms (SNPs). Dense human SNP maps are currently under construction. However, the utility of those maps and screens will be limited by the fact that humans are diploid and it is presently difficult to get separate data on the two "copies." Hence, genotype (blended) SNP data will be collected, and the desired haplotype (partitioned) data must then be (partially) inferred. A particular nondeterministic inference algorithm was proposed and studied by Clark (1990) and extensively used by Clark et al. (1998). In this paper, we more closely examine that inference method and the question of whether we can obtain an efficient, deterministic variant to optimize the obtained inferences. We show that the problem is NP-hard and, in fact, Max-SNP complete; that the reduction creates problem instances conforming to a severe restriction believed to hold in real data (Clark, 1990); and that even if we first use a natural exponential-time operation, the remaining optimization problem is NP-hard. However, we also develop, implement, and test an approach based on that operation and (integer) linear programming. The approach works quickly and correctly on simulated data.
引用
收藏
页码:305 / 323
页数:19
相关论文
共 21 条
[1]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[2]   It's raining SNPs, hallelujah? [J].
Chakravarti, A .
NATURE GENETICS, 1998, 19 (03) :216-217
[3]   Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase [J].
Clark, AG ;
Weiss, KM ;
Nickerson, DA ;
Taylor, SL ;
Buchanan, A ;
Stengård, J ;
Salomaa, V ;
Vartiainen, E ;
Perola, M ;
Boerwinkle, E ;
Sing, CF .
AMERICAN JOURNAL OF HUMAN GENETICS, 1998, 63 (02) :595-612
[4]  
FULLERTON M, 2000, AM J HUM GENET, P881
[5]  
GABOW H, COMMUNICATION
[6]  
Gabow H. N., 1976, IEEE Transactions on Software Engineering, VSE-2, P227, DOI 10.1109/TSE.1976.233819
[7]  
Gusfield D, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P183
[8]  
Hagmann M, 1999, SCIENCE, V285, P21
[9]  
HUBBEL E, COMMUNICATION
[10]  
Hudson R.R., 1990, Oxford Surveys in Evolutionary Biology, V7, P1