Problems and Solutions for Estimating Indel Rates and Length Distributions

被引:50
作者
Cartwright, Reed A. [1 ]
机构
[1] N Carolina State Univ, Dept Genet, Bioinformat Res Ctr, Raleigh, NC 27695 USA
基金
美国国家卫生研究院;
关键词
SEQUENCE ALIGNMENT; DNA-SEQUENCES; EM ALGORITHM; HUMAN GENOME; GAP COSTS; EVOLUTION; DIVERGENCE; INSERTIONS; CHIMPANZEE; DELETIONS;
D O I
10.1093/molbev/msn275
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.
引用
收藏
页码:473 / 480
页数:8
相关论文
共 45 条
  • [1] Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence
    Anzai, T
    Shiina, T
    Kimura, N
    Yanagiya, K
    Kohara, S
    Shigenari, A
    Yamagata, T
    Kulski, JK
    Naruse, TK
    Fujimori, Y
    Fukuzumi, Y
    Yamazaki, M
    Tashiro, H
    Iwamoto, C
    Umehara, Y
    Imanishi, T
    Meyer, A
    Ikeo, K
    Gojobori, T
    Bahram, S
    Inoko, H
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (13) : 7708 - 7713
  • [2] EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS
    BENNER, SA
    COHEN, MA
    GONNET, GH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) : 1065 - 1082
  • [3] Majority of divergence between closely related DNA samples is due to indels
    Britten, RJ
    Rowen, L
    Williams, J
    Cameron, RA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (08) : 4661 - 4665
  • [4] Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels
    Britten, RJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (21) : 13633 - 13635
  • [5] Logarithmic gap costs decrease alignment accuracy
    Cartwright, Reed A.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [6] Ngila: global pairwise alignments with logarithmic and affine gap costs
    Cartwright, Reed A.
    [J]. BIOINFORMATICS, 2007, 23 (11) : 1427 - 1428
  • [7] Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments
    Chang, MSS
    Benner, SA
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (02) : 617 - 631
  • [8] Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees
    Chen, FC
    Li, WH
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 68 (02) : 444 - 456
  • [9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [10] Durbin R., 1998, Analysis, V356, DOI [10.1017/CBO9780511790492, DOI 10.1017/CBO9780511790492]