Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities

被引:39
作者
Martin, John [1 ]
Sykes, Sean [2 ]
Young, Sarah [2 ]
Kota, Karthik [1 ]
Sanka, Ravi [3 ]
Sheth, Nihar [4 ]
Orvis, Joshua [5 ]
Sodergren, Erica [1 ,6 ]
Wang, Zhengyuan [1 ]
Weinstock, George M. [1 ,6 ]
Mitreva, Makedonka [1 ,6 ]
机构
[1] Washington Univ, Sch Med, Genome Inst, St Louis, MO 63130 USA
[2] MIT & Harvard, Broad Inst, Cambridge, MA USA
[3] J Craig Venter Inst, Rockville, MD USA
[4] Virginia Commonwealth Univ, Ctr Study Biol Complex, Richmond, VA USA
[5] Univ Maryland, Sch Med, Inst Genome Sci, Baltimore, MD 21201 USA
[6] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
来源
PLOS ONE | 2012年 / 7卷 / 06期
基金
美国国家卫生研究院;
关键词
SEQUENCE; ALIGNMENT;
D O I
10.1371/journal.pone.0036427
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.
引用
收藏
页数:15
相关论文
共 25 条
  • [1] Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen
    Ajdic, D
    McShan, WM
    McLaughlin, RE
    Savic, G
    Chang, J
    Carson, MB
    Primeaux, C
    Tian, RY
    Kenton, S
    Jia, HG
    Lin, SP
    Qian, YD
    Li, SL
    Zhu, H
    Najar, F
    Lai, HS
    White, J
    Roe, BA
    Ferretti, JJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) : 14434 - 14439
  • [2] Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
  • [3] Mauve: Multiple alignment of conserved genomic sequence with rearrangements
    Darling, ACE
    Mau, B
    Blattner, FR
    Perna, NT
    [J]. GENOME RESEARCH, 2004, 14 (07) : 1394 - 1403
  • [4] NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes
    DeSantis, T. Z.
    Hugenholtz, P.
    Keller, K.
    Brodie, E. L.
    Larsen, N.
    Piceno, Y. M.
    Phan, R.
    Andersen, G. L.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W394 - W399
  • [5] Diversity of the human intestinal microbial flora
    Eckburg, PB
    Bik, EM
    Bernstein, CN
    Purdom, E
    Dethlefsen, L
    Sargent, M
    Gill, SR
    Nelson, KE
    Relman, DA
    [J]. SCIENCE, 2005, 308 (5728) : 1635 - 1638
  • [6] Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons
    Haas, Brian J.
    Gevers, Dirk
    Earl, Ashlee M.
    Feldgarden, Mike
    Ward, Doyle V.
    Giannoukos, Georgia
    Ciulla, Dawn
    Tabbaa, Diana
    Highlander, Sarah K.
    Sodergren, Erica
    Methe, Barbara
    DeSantis, Todd Z.
    Petrosino, Joseph F.
    Knight, Rob
    Birren, Bruce W.
    [J]. GENOME RESEARCH, 2011, 21 (03) : 494 - 504
  • [7] HANCOCK JM, 1994, COMPUT APPL BIOSCI, V10, P67
  • [8] Metagenomics: Application of genomics to uncultured microorganisms
    Handelsman, J
    [J]. MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, 2004, 68 (04) : 669 - +
  • [9] Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package
    Kumar, Yadhu
    Westram, Ralf
    Kipfer, Peter
    Meier, Harald
    Ludwig, Wolfgang
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [10] Mapping short DNA sequencing reads and calling variants using mapping quality scores
    Li, Heng
    Ruan, Jue
    Durbin, Richard
    [J]. GENOME RESEARCH, 2008, 18 (11) : 1851 - 1858