When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

被引:168
作者
Gardner, Shea N. [1 ]
Hall, Barry G. [2 ]
机构
[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
[2] Bellingham Res Inst, Bellingham, WA USA
来源
PLOS ONE | 2013年 / 8卷 / 12期
关键词
TREES; EPIDEMIOLOGY; EVOLUTION;
D O I
10.1371/journal.pone.0081760
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read'' genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
引用
收藏
页数:12
相关论文
共 26 条
[1]   Gegenees: Fragmented Alignment of Multiple Genomes for Determining Phylogenomic Distances and Genetic Signatures Unique for Specified Target Groups [J].
Agren, Joakim ;
Sundstrom, Anders ;
Hafstrom, Therese ;
Segerman, Bo .
PLOS ONE, 2012, 7 (06)
[2]   Genomic Comparison of Escherichia coli O104:H4 Isolates from 2009 and 2011 Reveals Plasmid, and Prophage Heterogeneity, Including Shiga Toxin Encoding Phage stx2 [J].
Ahmed, Sanaa A. ;
Awosika, Joy ;
Baldwin, Carson ;
Bishop-Lilly, Kimberly A. ;
Biswas, Biswajit ;
Broomall, Stacey ;
Chain, Patrick S. G. ;
Chertkov, Olga ;
Chokoshvili, Otar ;
Coyne, Susan ;
Davenport, Karen ;
Detter, J. Chris ;
Dorman, William ;
Erkkila, Tracy H. ;
Folster, Jason P. ;
Frey, Kenneth G. ;
George, Matroner ;
Gleasner, Cheryl ;
Henry, Matthew ;
Hill, Karen K. ;
Hubbard, Kyle ;
Insalaco, Joseph ;
Johnson, Shannon ;
Kitzmiller, Aaron ;
Krepps, Michael ;
Lo, Chien-Chi ;
Truong Luu ;
McNew, Lauren A. ;
Minogue, Timothy ;
Munk, Christine A. ;
Osborne, Brian ;
Patel, Mohit ;
Reitenga, Krista G. ;
Rosenzweig, C. Nicole ;
Shea, April ;
Shen, Xiaohong ;
Strockbine, Nancy ;
Tarr, Cheryl ;
Teshima, Hazuki ;
van Gieson, Eric ;
Verratti, Kathleen ;
Wolcott, Mark ;
Xie, Gary ;
Sozhamannan, Shanmuga ;
Gibbons, Henry S. .
PLOS ONE, 2012, 7 (11)
[3]   progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement [J].
Darling, Aaron E. ;
Mau, Bob ;
Perna, Nicole T. .
PLOS ONE, 2010, 5 (06)
[4]  
Felsenstein J., 2004, Phylip (phylogeny inference package) version 3.6
[5]   Bioinformatics for microbial genotyping of equine encephalitis viruses, orthopoxviruses, and hantaviruses [J].
Gardner, Shea N. ;
Jaing, Crystal J. .
JOURNAL OF VIROLOGICAL METHODS, 2013, 193 (01) :112-120
[6]  
Gardner SN., 2010, J Forensic Res, DOI DOI 10.4172/2157-7145.1000107
[7]   Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011 [J].
Grad, Yonatan H. ;
Lipsitch, Marc ;
Feldgarden, Michael ;
Arachchi, Harindra M. ;
Cerqueira, Gustavo C. ;
FitzGerald, Michael ;
Godfrey, Paul ;
Haas, Brian J. ;
Murphy, Cheryl I. ;
Russ, Carsten ;
Sykes, Sean ;
Walker, Bruce J. ;
Wortman, Jennifer R. ;
Young, Sarah ;
Zeng, Qiandong ;
Abouelleil, Amr ;
Bochicchio, James ;
Chauvin, Sara ;
DeSmet, Timothy ;
Gujja, Sharvari ;
McCowan, Caryn ;
Montmayeur, Anna ;
Steelman, Scott ;
Frimodt-Moller, Jakob ;
Petersen, Andreas M. ;
Struve, Carsten ;
Krogfelt, Karen A. ;
Bingen, Edouard ;
Weill, Francois-Xavier ;
Lander, Eric S. ;
Nusbaum, Chad ;
Birren, Bruce W. ;
Hung, Deborah T. ;
Hanage, William P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (08) :3065-3070
[8]   Simulating DNA coding sequence evolution with EvolveAGene 3 [J].
Hall, Barry G. .
MOLECULAR BIOLOGY AND EVOLUTION, 2008, 25 (04) :688-695
[9]   Using Complete Genome Comparisons to Identify Sequences Whose Presence Accurately Predicts Clinically Important Phenotypes [J].
Hall, Barry G. ;
Cardenas, Heliodoro ;
Barlow, Miriam .
PLOS ONE, 2013, 8 (07)
[10]   Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences [J].
Hall, BG .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (03) :792-802