Efficient Selection of Branch-Specific Models of Sequence Evolution

被引:39
作者
Dutheil, Julien Y. [1 ]
Galtier, Nicolas [1 ]
Romiguier, Jonathan [1 ]
Douzery, Emmanuel J. P. [1 ]
Ranwez, Vincent [1 ,2 ]
Boussau, Bastien [3 ,4 ]
机构
[1] Univ Montpellier 2, Inst Sci Evolut Montpellier, Montpellier, France
[2] Montpellier SupAgro, UMR AGAP, Montpellier, France
[3] Univ Lyon 1, Lab Biometrie & Biol Evolut, CNRS, UMR5558, F-69622 Villeurbanne, France
[4] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
基金
欧洲研究理事会;
关键词
molecular phylogenetics; maximum likelihood; ancestral character reconstruction; dN; dS; paml; selection; SUBSTITUTION RATES; LIKELIHOOD; TEMPERATURE; TIME; TREE; PHYLOGENETICS; SIMULATION; COMPLEXITY; RADIATION; LIBRARIES;
D O I
10.1093/molbev/mss059
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been simila or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.
引用
收藏
页码:1861 / 1874
页数:14
相关论文
共 42 条
[1]   Dating Phylogenies with Hybrid Local Molecular Clocks [J].
Aris-Brosou, Stephane .
PLOS ONE, 2007, 2 (09)
[2]   Efficient likelihood computations with nonreversible models of evolution [J].
Boussau, Bastien ;
Gouy, Manolo .
SYSTEMATIC BIOLOGY, 2006, 55 (05) :756-768
[3]   NONADAPTIVE EVOLUTION OF MITOCHONDRIAL GENOME SIZE [J].
Boussau, Bastien ;
Brown, Jeremy M. ;
Fujita, Matthew K. .
EVOLUTION, 2011, 65 (09) :2706-2711
[4]   Genomes as documents of evolutionary history [J].
Boussau, Bastien ;
Daubin, Vincent .
TRENDS IN ECOLOGY & EVOLUTION, 2010, 25 (04) :224-232
[5]   Parallel adaptations to high temperatures in the Archaean eon [J].
Boussau, Bastien ;
Blanquart, Samuel ;
Necsulea, Anamaria ;
Lartillot, Nicolas ;
Gouy, Manolo .
NATURE, 2008, 456 (7224) :942-U74
[6]   Why do species vary in their rate of molecular evolution? [J].
Bromham, Lindell .
BIOLOGY LETTERS, 2009, 5 (03) :401-404
[7]   Bayesian random local clocks, or one rate to rule them all [J].
Drummond, Alexei J. ;
Suchard, Marc A. .
BMC BIOLOGY, 2010, 8
[8]   A model-based approach for detecting coevolving positions in a molecule [J].
Dutheil, J ;
Pupko, T ;
Jean-Marie, A ;
Galtier, N .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (09) :1919-1928
[9]   Bio++:: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics [J].
Dutheil, Julien ;
Gaillard, Sylvain ;
Bazin, Eric ;
Glemin, Sylvain ;
Ranwez, Vincent ;
Galtier, Nicolas ;
Belkhir, Khalid .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]   Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs [J].
Dutheil, Julien ;
Boussau, Bastien .
BMC EVOLUTIONARY BIOLOGY, 2008, 8 (1)