The number of recombination events in a sample history: Conflict graph and lower bounds

被引:33
作者
Bafna, V [1 ]
Bansal, V [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
recombination; phylogenetic networks; ancestral recombination graph; haplotypes; lower bounds; conflict graph; NP-completeness;
D O I
10.1109/TCBB.2004.23
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem [29]. Hudson and Kaplan [15] gave a lower bound based on the four-gamete test. In practice, their bound R-m often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths [22], who introduced two new lower bounds R-h and R-s which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R-h and R-s are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R-c, in the conflict graph [4] for a given set of sequences, computable in time O(nm(2)), is also a lower bound on the minimum number of recombination events. We show that in many cases, R-c is a better bound than R-h. The conflict graph was used by Gusfield et al. [4] to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 29 条
[1]   Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase [J].
Clark, AG ;
Weiss, KM ;
Nickerson, DA ;
Taylor, SL ;
Buchanan, A ;
Stengård, J ;
Salomaa, V ;
Vartiainen, E ;
Perola, M ;
Boerwinkle, E ;
Sing, CF .
AMERICAN JOURNAL OF HUMAN GENETICS, 1998, 63 (02) :595-612
[2]   Evidence for substantial fine-scale variation in recombination rates across the human genome [J].
Crawford, DC ;
Bhangale, T ;
Li, N ;
Hellenthal, G ;
Rieder, MJ ;
Nickerson, DA ;
Stephens, M .
NATURE GENETICS, 2004, 36 (07) :700-706
[3]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[4]  
Fearnhead P, 2001, GENETICS, V159, P1299
[5]   Application of coalescent mediods to reveal fine-scale rate vairiation and recombination hotspots [J].
Fearnhead, P ;
Harding, RM ;
Schneider, JA ;
Myers, S ;
Donnelly, P .
GENETICS, 2004, 167 (04) :2067-2081
[6]   The structure of haplotype blocks in the human genome [J].
Gabriel, SB ;
Schaffner, SF ;
Nguyen, H ;
Moore, JM ;
Roy, J ;
Blumenstiel, B ;
Higgins, J ;
DeFelice, M ;
Lochner, A ;
Faggart, M ;
Liu-Cordero, SN ;
Rotimi, C ;
Adeyemo, A ;
Cooper, R ;
Ward, R ;
Lander, ES ;
Daly, MJ ;
Altshuler, D .
SCIENCE, 2002, 296 (5576) :2225-2229
[7]   Ancestral inference from samples of DNA sequences with recombination [J].
Griffiths, RC ;
Marjoram, P .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1996, 3 (04) :479-502
[8]   Efficient reconstruction of phylogenetic networks with constrained recombination [J].
Gusfield, D ;
Eddhu, S ;
Langley, C .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :363-374
[9]   EFFICIENT ALGORITHMS FOR INFERRING EVOLUTIONARY TREES [J].
GUSFIELD, D .
NETWORKS, 1991, 21 (01) :19-28
[10]  
GUSFIELD D, 2004, FUNDAMENTAL EFFICIEN