Bayesian modeling of recombination events in bacterial populations

被引:22
作者
Marttinen, Pekka [1 ]
Baldwin, Adam [2 ]
Hanage, William P. [3 ]
Dowson, Chris [2 ]
Mahenthiralingam, Eshwar [4 ]
Corander, Jukka [5 ]
机构
[1] Univ Helsinki, Dept Math & Stat, FIN-00014 Helsinki, Finland
[2] Univ Warwick, Dept Biol Sci, Coventry CV4 7AL, W Midlands, England
[3] Univ London Imperial Coll Sci Technol & Med, Dept Infect Dis Epidemiol, London W2 1PG, England
[4] Cardiff Univ, Cardiff Sch Biosci, Cardiff CF10 3TL, S Glam, Wales
[5] Abo Akad Univ, Dept Math, FIN-20500 Turku, Finland
基金
芬兰科学院;
关键词
D O I
10.1186/1471-2105-9-421
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/mnf//mate/jc/software/brat.html.
引用
收藏
页数:28
相关论文
共 41 条
[1]  
Aarts E., 1989, Wiley-Interscience Series in Discrete Mathematics and Optimization
[2]  
[Anonymous], 1995, Theory of Statistics
[3]  
[Anonymous], 1989, Cladistics, DOI DOI 10.1111/J.1096-0031.1989.TB00562.X
[4]   Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography [J].
Arenas, Miguel ;
Posada, David .
BMC BIOINFORMATICS, 2007, 8 (1)
[5]   Multilocus sequence typing scheme that provides both species and strain differentiation for the Burkholderia cepacia complex [J].
Baldwin, A ;
Mahenthiralingam, E ;
Thickett, KM ;
Honeybourne, D ;
Maiden, MCJ ;
Govan, JR ;
Speert, DP ;
LiPuma, JJ ;
Vandamme, P ;
Dowson, CG .
JOURNAL OF CLINICAL MICROBIOLOGY, 2005, 43 (09) :4665-4673
[6]   The Burkholderia cepacia epidemic strain marker is part of a novel genomic island encoding both virulence and metabolism-associated genes in Burkholderia cenocepacia [J].
Baldwin, A ;
Sokol, PA ;
Parkhill, J ;
Mahenthiralingam, E .
INFECTION AND IMMUNITY, 2004, 72 (03) :1537-1547
[7]   Environmental Burkholderia cepacia complex isolates in human infections [J].
Baldwin, Adam ;
Mahenthiralingam, Eshwar ;
Drevinek, Pavel ;
Vandamme, Peter ;
Govan, John R. ;
Waine, David J. ;
LiPuma, John J. ;
Chiarini, Luigi ;
Dalmastri, Claudia ;
Henry, Deborah A. ;
Speert, David P. ;
Honeybourne, David ;
Maiden, Martin C. J. ;
Dowson, Chris G. .
EMERGING INFECTIOUS DISEASES, 2007, 13 (03) :458-461
[8]  
Braun JV, 1998, STAT SCI, V13, P142
[9]   Detecting recombination in evolving nucleotide sequences [J].
Chan, Cheong Xin ;
Beiko, Robert G. ;
Ragan, Mark A. .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]   A systematics for discovering the fundamental units of bacterial diversity [J].
Cohan, Frederick M. ;
Perry, Elizabeth B. .
CURRENT BIOLOGY, 2007, 17 (10) :R373-R386