A "long indel" model for evolutionary sequence alignment

被引:90
作者
Miklós, I [1 ]
Lunter, GA [1 ]
Holmes, I [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
关键词
stochastic modeling of molecular evolution; structural alignment; Maximum Likelihood evolutionary time estimation;
D O I
10.1093/molbev.msh043
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.
引用
收藏
页码:529 / 540
页数:12
相关论文
共 18 条
[1]   Estimation of reversible substitution matrices from multiple pairs of sequences [J].
Arvestad, L ;
Bruno, WJ .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 45 (06) :696-703
[2]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[3]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[4]   Statistical alignment: Computational properties, homology testing and goodness-of-fit [J].
Hein, J ;
Wiuf, C ;
Knudsen, B ;
Moller, MB ;
Wibling, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :265-279
[5]  
Hein J, 2001, Pac Symp Biocomput, P179
[6]   An expectation maximization algorithm for training hidden substitution models [J].
Holmes, I ;
Rubin, GM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 317 (05) :753-764
[7]   Evolutionary HMMs: a Bayesian approach to multiple alignment [J].
Holmes, I ;
Bruno, WJ .
BIOINFORMATICS, 2001, 17 (09) :803-820
[8]   Dynamic programming alignment accuracy [J].
Holmes, I ;
Durbin, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (03) :493-504
[9]  
JENSEN J, 2002, 429 U AARH DEP THEOR
[10]   An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees [J].
Lunter, GA ;
Miklós, I ;
Song, YS ;
Hein, J .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) :869-889