Detecting homogeneous segments in DNA sequences by using hidden Markov models

被引:55
作者
Boys, RJ [1 ]
Henderson, DA [1 ]
Wilkinson, DJ [1 ]
机构
[1] Newcastle Univ, Dept Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
Bayesian estimation; bioinformatics; data augmentation; deoxyribonucleic acid sequences; hidden Markov models; intron 7 of the chimpanzee and human alpha-fetoprotein gene; Markov chain Monte Carlo methods;
D O I
10.1111/1467-9876.00191
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent years there has been a rapid growth in the amount of DNA being sequenced and in its availability through genetic databases. Statistical techniques which identify structure within these sequences can be of considerable assistance to molecular biologists particularly when they incorporate the discrete nature of changes caused by evolutionary processes. This paper focuses on the detection of homogeneous segments within heterogeneous DNA sequences. In particular, we study an intron from the chimpanzee alpha-fetoprotein gene; this protein plays an important role in the embryonic development of mammals. We present a Bayesian solution to this segmentation problem using a hidden Markov model implemented by Markov chain Monte Carlo methods. We consider the important practical problem of specifying informative prior knowledge about sequences of this type. Two Gibbs sampling algorithms are contrasted and the sensitivity of the analysis to the prior specification is investigated. Model selection and possible ways to overcome the label switching problem are also addressed. Our analysis of intron 7 identifies three distinct homogeneous segment types, two of which occur in more than one region, and one of which is reversible.
引用
收藏
页码:269 / 285
页数:17
相关论文
共 23 条
[1]  
Anderson T.W., 1986, STAT ANAL DATA, V2nd
[2]  
[Anonymous], 1995, CODA CONVERGENCE DIA
[3]  
Baldi P., 1998, Bioinformatics: The machine learning approach
[4]  
Braun JV, 1998, STAT SCI, V13, P142
[5]  
BREATHNACH R, 1981, ANNU REV BIOCHEM, V50, P349, DOI 10.1146/annurev.bi.50.070181.002025
[6]   HIERARCHICAL BAYESIAN-ANALYSIS OF CHANGEPOINT PROBLEMS [J].
CARLIN, BP ;
GELFAND, AE ;
SMITH, AFM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1992, 41 (02) :389-405
[7]  
Celeux G., 1998, COMPSTAT 98, P227, DOI [10.1007/978-3-662-01131-7_26, DOI 10.1007/978-3-662-01131-7_26]
[8]  
CHURCHILL GA, 1989, B MATH BIOL, V51, P79
[9]   Markov chain Monte Carlo convergence diagnostics: A comparative review [J].
Cowles, MK ;
Carlin, BP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :883-904
[10]  
Durbin R., 1998, BIOL SEQUENCE ANAL P