Logarithmic gap costs decrease alignment accuracy

被引:19
作者
Cartwright, Reed A. [1 ]
机构
[1] Univ Georgia, Dept Genet, Athens, GA 30602 USA
[2] N Carolina State Univ, Bioinformat Res Ctr, Raleigh, NC 27695 USA
关键词
D O I
10.1186/1471-2105-7-527
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. Results: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. Conclusion: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.
引用
收藏
页数:12
相关论文
共 36 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 2006, R LANG ENV STAT COMP
  • [3] [Anonymous], 1992, LIKELIHOOD, DOI DOI 10.56021/9780801844454
  • [4] [Anonymous], 2005, Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory
  • [5] EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS
    BENNER, SA
    COHEN, MA
    GONNET, GH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) : 1065 - 1082
  • [6] DNA assembly with gaps (Dawg): simulating sequence evolution
    Cartwright, RA
    [J]. BIOINFORMATICS, 2005, 21 : 31 - 38
  • [7] CARTWRIGHT RA, NGILA GLOBAL PAIRWIS
  • [8] Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments
    Chang, MSS
    Benner, SA
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (02) : 617 - 631
  • [9] Multiple sequence alignment with the Clustal series of programs
    Chenna, R
    Sugawara, H
    Koike, T
    Lopez, R
    Gibson, TJ
    Higgins, DG
    Thompson, JD
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3497 - 3500
  • [10] ProbCons: Probabilistic consistency-based multiple sequence alignment
    Do, CB
    Mahabhashyam, MSP
    Brudno, M
    Batzoglou, S
    [J]. GENOME RESEARCH, 2005, 15 (02) : 330 - 340