A de Bruijn Graph Approach to the Quantification of Closely-Related Genomes in a Microbial Community

被引:16
作者
Wang, Mingjie [1 ]
Ye, Yuzhen [1 ]
Tang, Haixu [1 ,2 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
[2] Indiana Univ, Ctr Genom & Bioinformat, Bloomington, IN 47405 USA
基金
美国国家卫生研究院;
关键词
closely-related genomes; de Bruijn graph; metagenomics; quantification; ALIGNMENT; SEQUENCE; GENERATION;
D O I
10.1089/cmb.2012.0058
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e. g., within the same genus) or strains (e. g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).
引用
收藏
页码:814 / 825
页数:12
相关论文
共 25 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Enterotypes of the human gut microbiome [J].
Arumugam, Manimozhiyan ;
Raes, Jeroen ;
Pelletier, Eric ;
Le Paslier, Denis ;
Yamada, Takuji ;
Mende, Daniel R. ;
Fernandes, Gabriel R. ;
Tap, Julien ;
Bruls, Thomas ;
Batto, Jean-Michel ;
Bertalan, Marcelo ;
Borruel, Natalia ;
Casellas, Francesc ;
Fernandez, Leyden ;
Gautier, Laurent ;
Hansen, Torben ;
Hattori, Masahira ;
Hayashi, Tetsuya ;
Kleerebezem, Michiel ;
Kurokawa, Ken ;
Leclerc, Marion ;
Levenez, Florence ;
Manichanh, Chaysavanh ;
Nielsen, H. Bjorn ;
Nielsen, Trine ;
Pons, Nicolas ;
Poulain, Julie ;
Qin, Junjie ;
Sicheritz-Ponten, Thomas ;
Tims, Sebastian ;
Torrents, David ;
Ugarte, Edgardo ;
Zoetendal, Erwin G. ;
Wang, Jun ;
Guarner, Francisco ;
Pedersen, Oluf ;
de Vos, Willem M. ;
Brunak, Soren ;
Dore, Joel ;
Weissenbach, Jean ;
Ehrlich, S. Dusko ;
Bork, Peer .
NATURE, 2011, 473 (7346) :174-180
[4]   A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera [J].
Deloger, Marc ;
El Karoui, Meriem ;
Petit, Marie-Agnes .
JOURNAL OF BACTERIOLOGY, 2009, 191 (01) :91-99
[5]  
Flouri T., 2011, P 2 C BIOINF COMP BI, P330
[6]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[7]   Description of Treponema azotonutricium sp nov and Treponema primitia sp nov., the first Spirochetes isolated from termite guts [J].
Graber, JR ;
Leadbetter, JR ;
Breznak, JA .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2004, 70 (03) :1315-1320
[8]   Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex [J].
Hamady, Micah ;
Walker, Jeffrey J. ;
Harris, J. Kirk ;
Gold, Nicholas J. ;
Knight, Rob .
NATURE METHODS, 2008, 5 (03) :235-237
[9]   Complete genome sequence of Treponema succinifaciens type strain (6091T) [J].
Han, Cliff ;
Gronow, Sabine ;
Teshima, Hazuki ;
Lapidus, Alla ;
Nolan, Matt ;
Lucas, Susan ;
Hammon, Nancy ;
Deshpande, Shweta ;
Cheng, Jan-Fang ;
Zeytun, Ahmed ;
Tapia, Roxanne ;
Goodwin, Lynne ;
Pitluck, Sam ;
Liolios, Konstantinos ;
Pagani, Ioanna ;
Ivanova, Natalia ;
Mavromatis, Konstantinos ;
Mikhailova, Natalia ;
Huntemann, Marcel ;
Pati, Amrita ;
Chen, Amy ;
Palaniappan, Krishna ;
Land, Miriam ;
Hauser, Loren ;
Brambilla, Evelyne-Marie ;
Rohde, Manfred ;
Goeker, Markus ;
Woyke, Tanja ;
Bristow, James ;
Eisen, Jonathan A. ;
Markowitz, Victor ;
Hugenholtz, Philip ;
Kyrpides, Nikos C. ;
Klenk, Hans-Peter ;
Detter, John C. .
STANDARDS IN GENOMIC SCIENCES, 2011, 4 (03) :361-370
[10]   Statistical inferences for isoform expression in RNA-Seq [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2009, 25 (08) :1026-1032