Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models

被引:107
作者
Anisimova, Maria [1 ,2 ]
Kosiol, Carolin [3 ]
机构
[1] Swiss Fed Inst Technol ETHZ, Inst Computat Sci, Zurich, Switzerland
[2] Swiss Inst Bioinformat, Lausanne, Switzerland
[3] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
基金
美国国家科学基金会;
关键词
AMINO-ACID SITES; DETECTING POSITIVE SELECTION; MAXIMUM-LIKELIHOOD-ESTIMATION; NONSYNONYMOUS NUCLEOTIDE SUBSTITUTION; DNA-SEQUENCES; MOLECULAR EVOLUTION; ADAPTIVE EVOLUTION; GENETIC ALGORITHM; BAYESIAN-INFERENCE; NATURAL-SELECTION;
D O I
10.1093/molbev/msn232
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This review is motivated by the true explosion in the number of recent studies both developing and ameliorating probabilistic models of codon evolution. Traditionally parametric, the first codon models focused on estimating the effects of selective pressure on the protein via an explicit parameter in the maximum likelihood framework. Likelihood ratio tests of nested codon models armed the biologists with powerful tools, which provided unambiguous evidence for positive selection in real data. This, in turn, triggered a new wave of methodological developments. The new generation of models views the codon evolution process in a more sophisticated way, relaxing several mathematical assumptions. These models make a greater use of physicochemical amino acid properties, genetic code machinery, and the large amounts of data from the public domain. The overview of the most recent advances on modeling codon evolution is presented here, and a wide range of their applications to real data is discussed. On the downside, availability of a large variety of models, each accounting for various biological factors, increases the margin for misinterpretation; the biological meaning of certain parameters may vary among models, and model selection procedures also deserve greater attention. Solid understanding of the modeling assumptions and their applicability is essential for successful statistical data analysis.
引用
收藏
页码:255 / 271
页数:17
相关论文
共 185 条
  • [91] Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain
    Lemey, P
    Derdelinckx, I
    Rambaut, A
    Van Laethem, K
    Dumont, S
    Vermeulen, S
    Van Wijngaerden, E
    Vandamme, AM
    [J]. JOURNAL OF VIROLOGY, 2005, 79 (18) : 11981 - 11989
  • [92] Evolutionary dynamics of human retroviruses investigated through full-genome scanning
    Lemey, P
    Van Dooren, S
    Vandamme, AM
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (04) : 942 - 951
  • [93] Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics
    Lemey, Philippe
    Kosakovsky Pond, Sergei L.
    Drummond, Alexei J.
    Pybus, Oliver G.
    Shapiro, Beth
    Barroso, Helena
    Taveira, Nuno
    Rambaut, Andrew
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (02) : 282 - 292
  • [94] The metapopulation genetic algorithm: An efficient solution for the problem of phylogeny estimation
    Lemmon, AR
    Milinkovitch, MC
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (16) : 10516 - 10521
  • [95] Li W.H., 1997, MOL EVOLUTION
  • [96] An algorithm for progressive multiple alignment of sequences with insertions
    Löytynoja, A
    Goldman, N
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (30) : 10557 - 10562
  • [97] Detecting amino acid sites under positive selection and purifying selection
    Massingham, T
    Goldman, N
    [J]. GENETICS, 2005, 169 (03) : 1753 - 1762
  • [98] Maynard-Smith J., 1996, Genetics, V142, P1033
  • [99] A Gamma mixture model better accounts for among site rate heterogeneity
    Mayrose, I
    Friedman, N
    Pupko, T
    [J]. BIOINFORMATICS, 2005, 21 : 151 - 158
  • [100] Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates
    Mayrose, Itay
    Doron-Faigenboim, Adi
    Bacharach, Eran
    Pupko, Tal
    [J]. BIOINFORMATICS, 2007, 23 (13) : I319 - I327