Bayesian restoration of a hidden Markov chain with applications to DNA sequencing

被引:13
作者
Churchill, GA
Lazareva, B
机构
[1] Jackson Lab, Bar Harbor, ME 04609 USA
[2] Mol Applicat Grp, Palo Alto, CA USA
关键词
Bayesian inference; hidden Markov model; Monte Carlo Markov chain; sequence alignment;
D O I
10.1089/cmb.1999.6.261
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessable through the Bayesian approach to HMM restoration.
引用
收藏
页码:261 / 277
页数:17
相关论文
共 36 条
[1]   THE POSTERIOR PROBABILITY-DISTRIBUTION OF ALIGNMENTS AND ITS APPLICATION TO PARAMETER-ESTIMATION OF EVOLUTIONARY TREES AND TO OPTIMIZATION OF MULTIPLE ALIGNMENTS [J].
ALLISON, L ;
WALLACE, CS .
JOURNAL OF MOLECULAR EVOLUTION, 1994, 39 (04) :418-430
[2]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[3]   STATISTICAL INFERENCE FOR PROBABILISTIC FUNCTIONS OF FINITE STATE MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1554-&
[4]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[5]   Rao-Blackwellisation of sampling schemes [J].
Casella, G ;
Robert, CP .
BIOMETRIKA, 1996, 83 (01) :81-94
[6]  
CHURCHILL GA, 1989, B MATH BIOL, V51, P79
[7]   HIDDEN MARKOV-CHAINS AND THE ANALYSIS OF GENOME STRUCTURE [J].
CHURCHILL, GA .
COMPUTERS & CHEMISTRY, 1992, 16 (02) :107-115
[8]  
Churchill GaryA., 1995, Biometrics Unit Technical Reports, V2, P90
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365