Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs

被引:98
作者
Dutheil, Julien [1 ]
Boussau, Bastien [2 ]
机构
[1] Univ Aarhus, Bioinformat Res Ctr, DK-8000 Aarhus C, Denmark
[2] Univ Lyon 1, Univ Lyon, CNRS, UMR 5558,Lab Biometr & Biol Evolut, F-69622 Villeurbanne, France
关键词
D O I
10.1186/1471-2148-8-255
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters. Results: We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required. Conclusion: We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.
引用
收藏
页数:12
相关论文
共 35 条
[1]   Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences [J].
Ababneh, F ;
Jermiin, LS ;
Ma, CS ;
Robinson, J .
BIOINFORMATICS, 2006, 22 (10) :1225-1231
[2]  
BLANQUART S, 2008, MOL BIOL EVOL
[3]   A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution [J].
Blanquart, Samuel ;
Lartillot, Nicolas .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (11) :2058-2071
[4]   Bayesian model adequacy and choice in phylogenetics [J].
Bollback, JP .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (07) :1171-1180
[5]   Efficient likelihood computations with nonreversible models of evolution [J].
Boussau, Bastien ;
Gouy, Manolo .
SYSTEMATIC BIOLOGY, 2006, 55 (05) :756-768
[6]   A TEST FOR SYMMETRY IN CONTINGENCY TABLES [J].
BOWKER, AH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1948, 43 (244) :572-574
[7]   Bio++:: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics [J].
Dutheil, Julien ;
Gaillard, Sylvain ;
Bazin, Eric ;
Glemin, Sylvain ;
Ranwez, Vincent ;
Galtier, Nicolas ;
Belkhir, Khalid .
BMC BIOINFORMATICS, 2006, 7 (1)
[8]  
Felsenstein J, 2005, PHYLIP PHYLOGENY INF
[9]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[10]   Modeling compositional heterogeneity [J].
Foster, PG .
SYSTEMATIC BIOLOGY, 2004, 53 (03) :485-495