THE SIZE DISTRIBUTION OF INSERTIONS AND DELETIONS IN HUMAN AND RODENT PSEUDOGENES SUGGESTS THE LOGARITHMIC GAP PENALTY FOR SEQUENCE ALIGNMENT

被引:126
作者
GU, X [1 ]
LI, WH [1 ]
机构
[1] UNIV TEXAS, SPH, CTR HUMAN GENET, HOUSTON, TX 77225 USA
关键词
DELETIONS; INSERTIONS; PSEUDOGENES; GAP PENALTY; SEQUENCE ALIGNMENT;
D O I
10.1007/BF00164032
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., f(k) = Ck(-b), where f(k) is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is w(k) = a + b In k, where a is the gap creation cost and b1nk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (a(i)) should be larger than that of deletion (a(d)); that is, a(i) - a(d) = In R, where R is the frequency ratio of deletions to insertions.
引用
收藏
页码:464 / 473
页数:10
相关论文
共 20 条
  • [1] EVALUATION AND IMPROVEMENTS IN THE AUTOMATIC ALIGNMENT OF PROTEIN SEQUENCES
    BARTON, GJ
    STERNBERG, MJE
    [J]. PROTEIN ENGINEERING, 1987, 1 (02): : 89 - 94
  • [2] EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS
    BENNER, SA
    COHEN, MA
    GONNET, GH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) : 1065 - 1082
  • [3] CAUSES OF MORE FREQUENT DELETIONS THAN INSERTIONS IN MUTATIONS AND PROTEIN EVOLUTION
    DEJONG, WW
    RYDEN, L
    [J]. NATURE, 1981, 290 (5802) : 157 - 159
  • [4] SIMILAR AMINO-ACID-SEQUENCES - CHANCE OR COMMON ANCESTRY
    DOOLITTLE, RF
    [J]. SCIENCE, 1981, 214 (4517) : 149 - 159
  • [5] OPTIMAL SEQUENCE ALIGNMENTS
    FITCH, WM
    SMITH, TF
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (05): : 1382 - 1386
  • [6] EVOLUTION OF A NONCODING REGION OF THE CHLOROPLAST GENOME
    GOLENBERG, EM
    CLEGG, MT
    DURBIN, ML
    DOEBLEY, J
    MA, DP
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 1993, 2 (01) : 52 - 64
  • [7] DELETIONS IN PROCESSED PSEUDOGENES ACCUMULATE FASTER IN RODENTS THAN IN HUMANS
    GRAUR, D
    SHUALI, Y
    LI, WH
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1989, 28 (04) : 279 - 285
  • [8] HIGGINS DG, 1992, COMPUT APPL BIOSCI, V8, P189
  • [9] Johnson N.L., 1969, DISCRETE DISTRIBUTIO
  • [10] KRAWCZAK M, 1991, HUM GENET, V86, P425