Accurate phylogenetic classification of variable-length DNA fragments

被引:315
作者
McHardy, Alice Carolyn
Garcia Martin, Hector
Tsirigos, Aristotelis
Hugenholtz, Philip
Rigoutsos, Isidore
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Bioinformat & Pattern Discovery Grp, Yorktown Hts, NY 10598 USA
[2] US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA
关键词
D O I
10.1038/NMETH976
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >= 1 kb with high specificity. (c) 2007 Nature Publishing Group.
引用
收藏
页码:63 / 72
页数:10
相关论文
共 35 条
  • [1] Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples
    Abe, Takashi
    Sugawara, Hideaki
    Kinouchi, Makoto
    Kanaya, Shigehiko
    Ikemura, Toshimichi
    [J]. DNA RESEARCH, 2005, 12 (05) : 281 - 290
  • [2] Abe Takashi, 2002, Genome Inform, V13, P12
  • [3] Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA
    Campbell, A
    Mrázek, J
    Karlin, S
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (16) : 9184 - 9189
  • [4] Exploration of phylogenetic data using a global sequence analysis method
    Chapus, C
    Dufraigne, C
    Edwards, S
    Giron, A
    Fertil, B
    Deschavanne, P
    [J]. BMC EVOLUTIONARY BIOLOGY, 2005, 5 (1)
  • [5] Toward automatic reconstruction of a highly resolved tree of life
    Ciccarelli, FD
    Doerks, T
    von Mering, C
    Creevey, CJ
    Snel, B
    Bork, P
    [J]. SCIENCE, 2006, 311 (5765) : 1283 - 1287
  • [6] The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis
    Cole, JR
    Chai, B
    Farris, RJ
    Wang, Q
    Kulam, SA
    McGarrell, DM
    Garrity, GM
    Tiedje, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D294 - D296
  • [7] Microbial community genomics in the ocean
    DeLong, EE
    [J]. NATURE REVIEWS MICROBIOLOGY, 2005, 3 (06) : 459 - 469
  • [8] Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences
    Deschavanne, PJ
    Giron, A
    Vilain, J
    Fagot, G
    Fertil, B
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) : 1391 - 1399
  • [9] Using pyrosequencing to shed light on deep mine microbial ecology
    Edwards, Robert A.
    Rodriguez-Brito, Beltran
    Wegley, Linda
    Haynes, Matthew
    Breitbart, Mya
    Peterson, Dean M.
    Saar, Martin O.
    Alexander, Scott
    Alexander, E. Calvin, Jr.
    Rohwer, Forest
    [J]. BMC GENOMICS, 2006, 7 (1)
  • [10] Computational improvements reveal great bacterial diversity and high metal toxicity in soil
    Gans, J
    Wolinsky, M
    Dunbar, J
    [J]. SCIENCE, 2005, 309 (5739) : 1387 - 1390