Evolution of simple sequence repeats

被引:19
作者
Bell, GI
机构
[1] Theor. Biol. and Biophysics MS K710, Los Alamos National Laboratory, Los Alamos
来源
COMPUTERS & CHEMISTRY | 1996年 / 20卷 / 01期
关键词
D O I
10.1016/S0097-8485(96)80006-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Simple Sequence Repeats (SSRs) are common and frequently polymorphic in eukaryote DNA. Many are subject to high rates of length mutation in which a gain or loss of one repeat unit is most often observed. Can the observed abundances and their length distributions be explained as the result of an unbiased random walk, starting from some initial repeat length? In order to address this question, we have considered two models for an unbiased random walk on the integers, n (n(0) less than or equal to n). The first is a continuous time process (Birth and Death Model or BDM) in which the probability of a transition to n + 1 or n - 1 is lambda k, with k = n - n(0) + 1 per unit time. The second is a discrete time model (Random Walk Model or RWM), in which a transition is made at each time step, either to n - 1 or to n + 1. In each case the walks start at length n(0), with new walks being generated at a steady rate, S, the source rate, determined by a base substitution rate of mutation from neighboring sequences. Each walk terminates whenever n reaches n(0) - 1 or at some time, T, which reflects the contamination of pure repeat sequences by other mutations that remove them from consideration, either because they fail to satisfy the criteria for repeat selection from some database or because they can no longer undergo efficient length mutations. For infinite T, the results are particularly simple for N(k), the expected number of repeats of length n = k + n(0) - 1, being, for BDM, N(k)= S/k lambda, and for RWM, N(k)= 2S. In each case, there is a cut-off value of k for finite T, namely k = T lambda. ln2 for BDM and k = 0.57 root T for RWM; for larger values of k, N(k) becomes rapidly smaller than the infinite time limit. We argue that these results may be compared with SSR length distributions averaged over many loci, but not for a particular locus, for which founder effects are important. For the data of Beckmann & Weber [(1992), Genomics 12, 627] on GT . AC repeats in the human, each model gives a reasonable fit to the data, with the source at two repeat units (n(0) = 2). Both the absolute number of loci and their length distribution are well represented.
引用
收藏
页码:41 / 48
页数:8
相关论文
共 41 条
  • [41] 1993, CELL, V72, P971