Detection of recombination in DNA multiple alignments with hidden Markov models

被引:24
作者
Husmeier, D [1 ]
Wright, F [1 ]
机构
[1] Scottish Crop Res Inst, BioSS, Dundee DD2 5DA, Scotland
关键词
phylogenetic trees; multiple alignments of DNA sequences; recombination; hidden Markov models; maximum likelihood and the expectation maximization (EM) algorithm;
D O I
10.1089/106652701752236214
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected.
引用
收藏
页码:401 / 427
页数:27
相关论文
共 24 条
[11]  
Husmeier D., 1999, Neural Networks for Conditional Probability Estimation
[12]   Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees [J].
Larget, B ;
Simon, DL .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (06) :750-759
[13]   A Bayesian model for detecting past recombination events in DNA multiple alignments [J].
McGuire, G ;
Wright, F ;
Prentice, MJ .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :159-170
[14]   A graphical method for detecting recombination in phylogenetic data sets [J].
McGuire, G ;
Wright, F ;
Prentice, MJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (11) :1125-1131
[15]  
MCGUIRE G, 1998, THESIS U EDINBURGH
[16]  
Neal R. M., 1999, LEARNING GRAPHICAL M, P355, DOI DOI 10.1007/978-94-011-5014-9
[17]   A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION [J].
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1989, 77 (02) :257-286
[18]   Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method [J].
Robert, CP ;
Rydén, T ;
Titterington, DM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :57-75
[19]   IDENTIFICATION OF BREAKPOINTS IN INTERGENOTYPIC RECOMBINANTS OF HIV TYPE-1 BY BOOTSCANNING [J].
SALMINEN, MO ;
CARR, JK ;
BURKE, DS ;
MCCUTCHAN, FE .
AIDS RESEARCH AND HUMAN RETROVIRUSES, 1995, 11 (11) :1423-1425
[20]  
Sankoff D., 1983, TIME WARPS STRING ED, P253