Benchmarking of Methods for Genomic Taxonomy

被引:258
作者
Larsen, Mette V. [1 ]
Cosentino, Salvatore [1 ]
Lukjancenko, Oksana [1 ]
Saputra, Dhany [1 ]
Rasmussen, Simon [1 ]
Hasman, Henrik [2 ]
Sicheritz-Ponten, Thomas [1 ]
Aarestrup, Frank M. [2 ]
Ussery, David W. [1 ,3 ]
Lund, Ole [1 ]
机构
[1] Tech Univ Denmark, Ctr Biol Sequence Anal, Dept Syst Biol, DK-2800 Lyngby, Denmark
[2] Tech Univ Denmark, Natl Food Inst, DK-2800 Lyngby, Denmark
[3] Oak Ridge Natl Lab, Biosci Div, Comparat Genom Grp, Oak Ridge, TN USA
关键词
SEQUENCING-BASED METHODS; BACILLUS-CEREUS GROUP; IN-SILICO ANALYSIS; ESCHERICHIA-COLI; IDENTIFICATION; PROTEIN; DATABASE; TREE; ALIGNMENT; DOMAIN;
D O I
10.1128/JCM.02981-13
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is-that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooc-curring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.
引用
收藏
页码:1529 / 1539
页数:11
相关论文
共 51 条
[41]   Shifting the genomic gold standard for the prokaryotic species definition [J].
Richter, Michael ;
Rossello-Mora, Ramon .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (45) :19126-19131
[42]   Genome phylogeny based on gene content [J].
Snel, B ;
Bork, P ;
Huynen, MA .
NATURE GENETICS, 1999, 21 (01) :108-110
[43]   Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination [J].
Sreevatsan, S ;
Pan, X ;
Stockbauer, KE ;
Connell, ND ;
Kreiswirth, BN ;
Whittam, TS ;
Musser, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (18) :9869-9874
[44]   In silico analysis of 16S rRNA gene sequencing based methods for identification of medically important aerobic Gram-negative bacteria [J].
Teng, Jade L. L. ;
Yeung, Ming-Yiu ;
Yue, Geoffrey ;
Au-Yeung, Rex K. H. ;
Yeung, Eugene Y. H. ;
Fung, Ami M. Y. ;
Tse, Herman ;
Yuen, Kwok-Yung ;
Lau, Susanna K. P. ;
Woo, Patrick C. Y. .
JOURNAL OF MEDICAL MICROBIOLOGY, 2011, 60 (09) :1281-1286
[45]   Notes on the characterization of prokaryote strains for taxonomic purposes [J].
Tindall, B. J. ;
Rossello-Mora, R. ;
Busse, H. -J. ;
Ludwig, W. ;
Kaempfer, P. .
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2010, 60 :249-266
[46]   Complete genome sequence of Halomicrobium mukohataei type strain (arg-2T) [J].
Tindall, Brian J. ;
Schneider, Susanne ;
Lapidus, Alla ;
Copeland, Alex ;
Del Rio, Tijana Glavina ;
Nolan, Matt ;
Lucas, Susan ;
Chen, Feng ;
Tice, Hope ;
Cheng, Jan-Fang ;
Saunders, Elizabeth ;
Bruce, David ;
Goodwin, Lynne ;
Pitluck, Sam ;
Mikhailova, Natalia ;
Pati, Amrita ;
Ivanova, Natalia ;
Mavrommatis, Konstantinos ;
Chen, Amy ;
Palaniappan, Krishna ;
Chain, Patrick ;
Land, Miriam ;
Hauser, Loren ;
Chang, Yun-Juan ;
Jeffries, Cynthia D. ;
Brettin, Thomas ;
Han, Cliff ;
Rohde, Manfred ;
Goeker, Markus ;
Bristow, Jim ;
Eisen, Jonathan A. ;
Markowitz, Victor ;
Hugenholtz, Philip ;
Klenk, Hans-Peter ;
Kyrpides, Nikos C. ;
Detter, John C. .
STANDARDS IN GENOMIC SCIENCES, 2009, 1 (03) :270-277
[47]   Description of an Unusual Neisseria meningitidis Isolate Containing and Expressing Neisseria gonorrhoeae-Specific 16S rRNA Gene Sequences [J].
Walcher, Marion ;
Skvoretz, Rhonda ;
Montgomery-Fullerton, Megan ;
Jonas, Vivian ;
Brentano, Steve .
JOURNAL OF CLINICAL MICROBIOLOGY, 2013, 51 (10) :3199-3206
[48]   In silico analysis of 16S ribosomal RNA gene sequencing-based methods for identification of medically important anaerobic bacteria [J].
Woo, Patrick C. Y. ;
Chung, Liliane M. W. ;
Teng, Jade L. L. ;
Tse, Herman ;
Pang, Sherby S. Y. ;
Lau, Veronica Y. T. ;
Wong, Vanessa W. K. ;
Kam, Kwok-Ling ;
Lau, Susanna K. P. ;
Yuen, Kwok-Yung .
JOURNAL OF CLINICAL PATHOLOGY, 2007, 60 (05) :576-579
[49]   Phylogeny determined by protein domain content [J].
Yang, S ;
Doolittle, RF ;
Bourne, PE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (02) :373-378
[50]   The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all sequenced type strains [J].
Yarza, Pablo ;
Richter, Michael ;
Peplies, Joerg ;
Euzeby, Jean ;
Amann, Rudolf ;
Schleifer, Karl-Heinz ;
Ludwig, Wolfgang ;
Gloeckner, Frank Oliver ;
Rossello-Mora, Ramon .
SYSTEMATIC AND APPLIED MICROBIOLOGY, 2008, 31 (04) :241-250