Simulating DNA coding sequence evolution with EvolveAGene 3

被引:35
作者
Hall, Barry G. [1 ]
机构
[1] Bellingham Res Inst, Bellingham, WA USA
关键词
D O I
10.1093/molbev/msn008
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenetic reconstruction based upon multiple alignments of molecular sequences is important to most branches of modern biology and is central to molecular evolution. Understanding the historical relationships among macromolecules depends upon computer programs that implement a variety of analytical methods. Because it is impossible to know those historical relationships with certainty, assessment of the accuracy of methods and the programs that implement them requires the use of programs that realistically simulate the evolution of DNA sequences. EvolveAGene 3 is a realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions, including variable regions of selection intensity within the sequence and variation in intensity of selection over branches. Variation includes base substitutions, insertions, and deletions. To the best of my knowledge, it is the only program available that simulates the evolution of intact coding sequences. Output includes the true tree and true alignments of the resulting coding sequence and corresponding protein sequences. A log file reports the frequencies of each kind of base substitution, the ratio of transition to transversion substitutions, the ratio of indel to base substitution mutations, and the numbers of silent and amino acid replacement mutations. The realism of the data sets has been assessed by comparing the d(N)/d(S) ratio, the ratio of transition to transversion substitutions, and the ratio of indel to base substitution mutations of the simulated data sets with those parameters of real data sets from the "gold standard" BaliBase collection of structural alignments. Results show that the data sets produced by EvolveAGene 3 are very similar to real data sets, and EvolveAGene 3 is therefore a realistic simulation program that can be used to evaluate a variety of programs and methods in molecular evolution.
引用
收藏
页码:688 / 695
页数:8
相关论文
共 23 条
[1]   Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative [J].
Anisimova, Maria ;
Gascuel, Olivier .
SYSTEMATIC BIOLOGY, 2006, 55 (04) :539-552
[2]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[3]   DNA reference alignment benchmarks based on tertiary structure of encoded proteins [J].
Carroll, Hyrum ;
Beckstead, Wesley ;
O'Connor, Timothy ;
Ebbert, Mark ;
Clement, Mark ;
Snell, Quinn ;
McClellan, David .
BIOINFORMATICS, 2007, 23 (19) :2648-2649
[4]   DNA assembly with gaps (Dawg): simulating sequence evolution [J].
Cartwright, RA .
BIOINFORMATICS, 2005, 21 :31-38
[5]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[6]  
Glickman B W, 1986, Basic Life Sci, V39, P259
[7]   Simple and accurate estimation of ancestral protein sequences [J].
Hall, BG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (14) :5431-5436
[8]   Spectra of spontaneous growth-dependent and adaptive mutations at ebgR [J].
Hall, BG .
JOURNAL OF BACTERIOLOGY, 1999, 181 (04) :1149-1155
[9]   Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences [J].
Hall, BG .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (03) :792-802
[10]   EXPERIMENTAL PHYLOGENETICS - GENERATION OF A KNOWN PHYLOGENY [J].
HILLIS, DM ;
BULL, JJ ;
WHITE, ME ;
BADGETT, MR ;
MOLINEUX, IJ .
SCIENCE, 1992, 255 (5044) :589-592