progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

被引:3038
作者
Darling, Aaron E. [1 ,2 ]
Mau, Bob [3 ,4 ]
Perna, Nicole T. [1 ,5 ]
机构
[1] Univ Wisconsin, Genome Ctr, Madison, WI USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
[3] Univ Wisconsin, Ctr Biotechnol, Madison, WI 53705 USA
[4] Univ Wisconsin, Dept Oncol, Madison, WI USA
[5] Univ Wisconsin, Dept Genet, Madison, WI 53706 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
SEQUENCE ALIGNMENT; ESCHERICHIA-COLI; TREE RECONSTRUCTION; LOCAL ALIGNMENT; EVOLUTION; HOMOLOGY; INVERSION; INFERENCE; REPEATS; VIEW;
D O I
10.1371/journal.pone.0011147
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Background: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. Methodology/Principal Findings: We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss ( flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core- genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence. Conclusions: The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.
引用
收藏
页数:17
相关论文
共 75 条
[1]
Achaz G, 2003, GENETICS, V164, P1279
[2]
Simultaneous Bayesian gene tree reconstruction and reconciliation analysis [J].
Akerborg, Oerjan ;
Sennblad, Bengt ;
Arvestad, Lars ;
Lagergren, Jens .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (14) :5714-5719
[3]
Multi-Break Rearrangements and Breakpoint Re-Uses: From Circular to Linear Genomes [J].
Alekseyev, Max A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (08) :1117-1131
[4]
Bergeron A, 2006, LECT NOTES COMPUT SC, V4175, P163
[5]
Ohno's dilemma: Evolution of new genes under continuous selection [J].
Bergthorsson, Ulfar ;
Andersson, Dan I. ;
Roth, John R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (43) :17004-17009
[6]
Blanchette, 1997, Genome Inform Ser Workshop Genome Inform, V8, P25
[7]
Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[8]
Fast Statistical Alignment [J].
Bradley, Robert K. ;
Roberts, Adam ;
Smoot, Michael ;
Juvekar, Sudeep ;
Do, Jaeyoung ;
Dewey, Colin ;
Holmes, Ian ;
Pachter, Lior .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (05)
[9]
MAVID multiple alignment server [J].
Bray, N ;
Pachter, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3525-3526
[10]
The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences [J].
Brudno, M ;
Steinkamp, R ;
Morgenstern, B .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W41-W44