Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

被引:9
作者
Choi, Sang Chul [1 ]
Redelings, Benjamin D. [1 ]
Thorne, Jeffrey L. [1 ]
机构
[1] N Carolina State Univ, Bioinformat Res Ctr, Raleigh, NC 27695 USA
关键词
variable-length Markov model; profile hidden Markov model; insertion-deletion model; scaled selection coefficient; fitness; Pfam;
D O I
10.1098/rstb.2008.0167
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions, evolutionary models can be improved. Instead of focusing on estimating parameters, we concentrate on the population genetic implications of these models. Specifically, we obtain estimates of the product of effective population size and relative fitness difference of alleles. The approach is illustrated with two applications to protein-coding DNA. In the first, a codon-based evolutionary model yields a stationary distribution of sequences, which, when the sequences are translated, matches a variable-length Markov model trained on human proteins. In the second, we introduce an insertion-deletion model that describes selectively neutral evolutionary changes to DNA. We then show how to modify the neutral model so that its stationary distribution at the amino acid level can match a profile hidden Markov model, such as the one associated with the Pfam database.
引用
收藏
页码:3931 / 3939
页数:9
相关论文
共 42 条
[1]   Algorithms for variable length Markov chain modeling [J].
Bejerano, G .
BIOINFORMATICS, 2004, 20 (05) :788-U729
[2]   Variations on probabilistic suffix trees: statistical modeling and prediction of protein families [J].
Bejerano, G ;
Yona, G .
BIOINFORMATICS, 2001, 17 (01) :23-43
[3]   Adaptive evolution of transcription factor binding sites -: art. no. 42 [J].
Berg, J ;
Willmann, S ;
Lässig, M .
BMC EVOLUTIONARY BIOLOGY, 2004, 4 (1)
[4]   A site- and time-heterogeneous model of amino acid replacement [J].
Blanquart, Samuel ;
Lartillot, Nicolas .
MOLECULAR BIOLOGY AND EVOLUTION, 2008, 25 (05) :842-858
[5]   A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution [J].
Blanquart, Samuel ;
Lartillot, Nicolas .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (11) :2058-2071
[6]   Multilocus association mapping using variable-length Markov chains [J].
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (06) :903-913
[7]   Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10096-10100
[8]   Quantifying the impact of protein tertiary structure on molecular evolution [J].
Choi, Sang Chul ;
Hobolth, Asger ;
Robinson, Douglas M. ;
Kishino, Hirohisa ;
Thorne, Jeffrey L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1769-1782
[9]   Pseudo-likelihood analysis of codon substitution models with neighbor-dependent rates [J].
Christensen, OF ;
Hobolth, A ;
Jensen, JL .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (09) :1166-1182
[10]  
DURBIN R, 1998, BIOL SEQUENCE ANAL P, P100