A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes

被引:143
作者
Bratcher, Holly B. [1 ]
Corton, Craig [2 ]
Jolley, Keith A. [1 ]
Parkhill, Julian [2 ]
Maiden, Martin C. J. [1 ]
机构
[1] Univ Oxford, Dept Zool, Oxford OX1 3PS, England
[2] Sanger Inst, Hinxton, England
基金
英国惠康基金;
关键词
Neisseria meningitidis; de novo assembly; BIGSdb; Gene-by-gene analysis; cgMLST; rMLST; rST; Bacterial population genomics; CORYNEBACTERIUM-PSEUDOTUBERCULOSIS I19; RESTRICTION-MODIFICATION SYSTEMS; MENINGOCOCCAL DISEASE; BACILLUS-SUBTILIS; SEQUENCE; SEROGROUP; STRAIN; CARRIAGE; AMPLIFICATION; NOMENCLATURE;
D O I
10.1186/1471-2164-15-1138
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Background: Highly parallel, 'second generation' sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. Results: The performance of de novo short-read assembly followed by automatic annotation using the pubMLST. org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. Conclusions: The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
引用
收藏
页数:16
相关论文
共 73 条
[1]
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]
High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies [J].
Aury, Jean-Marc ;
Cruaud, Corinne ;
Barbe, Valerie ;
Rogier, Odile ;
Mangenot, Sophie ;
Samson, Gaelle ;
Poulain, Julie ;
Anthouard, Veronique ;
Scarpelli, Claude ;
Artiguenave, Francois ;
Wincker, Patrick .
BMC GENOMICS, 2008, 9 (1)
[3]
Neisseria Adhesin A Variation and Revised Nomenclature Scheme [J].
Bambini, Stefania ;
De Chiara, Matteo ;
Muzzi, Alessandro ;
Mora, Marirosa ;
Lucidarme, Jay ;
Brehony, Carina ;
Borrow, Ray ;
Masignani, Vega ;
Comanducci, Maurizio ;
Maiden, Martin C. J. ;
Rappuoli, Rino ;
Pizza, Mariagrazia ;
Jolley, Keith A. .
CLINICAL AND VACCINE IMMUNOLOGY, 2014, 21 (07) :966-971
[4]
Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[5]
Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06 [J].
Bennett, Julia S. ;
Bentley, Stephen D. ;
Vernikos, Georgios S. ;
Quail, Michael A. ;
Cherevach, Inna ;
White, Brian ;
Parkhill, Julian ;
Maiden, Martin C. J. .
BMC GENOMICS, 2010, 11
[6]
Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18 [J].
Bentley, Stephen D. ;
Vernikos, George S. ;
Snyder, Lori A. S. ;
Churcher, Carol ;
Arrowsmith, Claire ;
Chillingworth, Tracey ;
Cronin, Ann ;
Davis, Paul H. ;
Holroyd, Nancy E. ;
Jagels, Kay ;
Maddison, Mark ;
Moule, Sharon ;
Rabbinowitsch, Ester ;
Sharp, Sarah ;
Unwin, Louise ;
Whitehead, Sally ;
Quail, Michael A. ;
Achtman, Mark ;
Barrell, Bart ;
Saunders, Nigel J. ;
Parkhill, Julian .
PLOS GENETICS, 2007, 3 (02) :230-240
[7]
A chromosomally integrated bacteriophage in invasive meningococci [J].
Bille, E ;
Zahar, JR ;
Perrin, A ;
Morelle, S ;
Kriz, P ;
Jolley, KA ;
Maiden, MCJ ;
Dervin, C ;
Nassif, X ;
Tinsley, CR .
JOURNAL OF EXPERIMENTAL MEDICINE, 2005, 201 (12) :1905-1913
[8]
Association of a Bacteriophage with Meningococcal Disease in Young Adults [J].
Bille, Emmanuelle ;
Ure, Roisin ;
Gray, Stephen J. ;
Kaczmarski, Edward B. ;
McCarthy, Noel D. ;
Nassif, Xavier ;
Maiden, Martin C. J. ;
Tinsley, Colin R. .
PLOS ONE, 2008, 3 (12)
[9]
Bratcher HB, 2012, FUTURE MICROBIOL, V7, P873, DOI [10.2217/FMB.12.62, 10.2217/fmb.12.62]
[10]
Multilocus sequence typing for global surveillance of meningococcal disease [J].
Brehony, Carina ;
Jolley, Keith A. ;
Maiden, Martin C. J. .
FEMS MICROBIOLOGY REVIEWS, 2007, 31 (01) :15-26