DNA assembly with gaps (Dawg): simulating sequence evolution

被引:89
作者
Cartwright, RA [1 ]
机构
[1] Univ Georgia, Dept Genet, Athens, GA 30602 USA
关键词
D O I
10.1093/bioinformatics/bti1200
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simulations useful for studying the impact of insertions, deletions, and alignments on phylogenetic accuracy. Results: To satisfy this gap I have developed a new algorithm of indel formation and incorporated it into a new, flexible, and portable application for sequence simulation. The application, called Dawg, simulates phylogenetic evolution of DNA sequences in continuous time using the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of indel formation. On completion, Dawg produces the true alignment of the simulated sequences. Unlike other applications, Dawg allows indel lengths to be explicitly distributed via a biologically realistic power law. Many options are available to allow users to customize their simulations and results. Because simulating with indels would be problematic if biologically realistic parameters could not be estimated, a script is provided with Dawg that can estimate the parameters of indel formation from sequence data. Dawg was applied to the sequences of four chloroplast trnK introns. It was used to parametrically bootstrap an estimation of the rate of indel formation for the phylogeny. Because Dawg can assist in parametric bootstrapping of sequence data it is useful beyond phylogenetics, such as studying alignment algorithms or parameters of molecular evolution.
引用
收藏
页码:31 / 38
页数:8
相关论文
共 49 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS [J].
BENNER, SA ;
COHEN, MA ;
GONNET, GH .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) :1065-1082
[3]   Reconstructing large regions of an ancestral mammalian genome in silico [J].
Blanchette, M ;
Green, ED ;
Miller, W ;
Haussler, D .
GENOME RESEARCH, 2004, 14 (12) :2412-2423
[4]  
BULL JJ, 1993, EVOLUTION, V47, P993, DOI 10.1111/j.1558-5646.1993.tb02130.x
[5]   Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments [J].
Chang, MSS ;
Benner, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (02) :617-631
[6]   RATES AND PATTERNS OF CHLOROPLAST DNA EVOLUTION [J].
CLEGG, MT ;
GAUT, BS ;
LEARN, GH ;
MORTON, BR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (15) :6795-6801
[7]  
FELSENSTEIN J, 1984, EVOLUTION, V38, P16, DOI 10.1111/j.1558-5646.1984.tb00255.x
[8]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[9]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[10]   EXACT STOCHASTIC SIMULATION OF COUPLED CHEMICAL-REACTIONS [J].
GILLESPIE, DT .
JOURNAL OF PHYSICAL CHEMISTRY, 1977, 81 (25) :2340-2361