A Markov chain Monte Carlo expectation maximization algorithm for statistical analysis of DNA sequence evolution with neighbor-dependent substitution rates

被引:18
作者
Hobolth, Asger [1 ]
机构
[1] N Carolina State Univ, Bioinformat Res Ctr, Raleigh, NC 27695 USA
关键词
EM-algorithm; Gibbs sampling; likelihood inference; molecular evolution; neighbor-dependence; path sampling;
D O I
10.1198/106186008X289010
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high-dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed for this purpose.
引用
收藏
页码:138 / 162
页数:25
相关论文
共 36 条
[1]  
Albert B., 2002, MOL BIOL CELL, V4th
[2]   Identification and measurement of neighbor-dependent nucleotide substitution processes [J].
Arndt, PF ;
Hwa, T .
BIOINFORMATICS, 2005, 21 (10) :2322-2328
[3]   Statistical inference for discretely observed Markov jump processes [J].
Bladt, M ;
Sorensen, M .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 :395-410
[4]   THE INFLUENCE OF NEAREST NEIGHBORS ON THE RATE AND PATTERN OF SPONTANEOUS POINT MUTATIONS [J].
BLAKE, RD ;
HESS, ST ;
NICHOLSONTUELL, J .
JOURNAL OF MOLECULAR EVOLUTION, 1992, 34 (03) :189-200
[5]   Ascent-based Monte Carlo expectation-maximization [J].
Caffo, BS ;
Jank, W ;
Jones, GL .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 :235-251
[6]   Pseudo-likelihood analysis of codon substitution models with neighbor-dependent rates [J].
Christensen, OF ;
Hobolth, A ;
Jensen, JL .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (09) :1166-1182
[7]   Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios [J].
Clark, AG ;
Glanowski, S ;
Nielsen, R ;
Thomas, PD ;
Kejariwal, A ;
Todd, MA ;
Tanenbaum, DM ;
Civello, D ;
Lu, F ;
Murphy, B ;
Ferriera, S ;
Wang, G ;
Zheng, XG ;
White, TJ ;
Sninsky, JJ ;
Adams, MD ;
Cargill, M .
SCIENCE, 2003, 302 (5652) :1960-1963
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   Bayesian analysis for reversible Markov chains [J].
Diaconis, Persi ;
Rolles, Silke W. W. .
ANNALS OF STATISTICS, 2006, 34 (03) :1270-1292
[10]  
DRTON M, 2004, THESIS U WASHINGTON