Protein signature-based estimation of metagenomic abundances including all domains of life and viruses

被引:26
作者
Klingenberg, Heiner [1 ]
Assauer, Kathrin Petra [1 ]
Lingner, Thomas [1 ]
Meinicke, Peter [1 ]
机构
[1] Univ Gottingen, Inst Microbiol & Genet, Dept Bioinformat, D-37077 Gottingen, Germany
关键词
PHYLOGENETIC CLASSIFICATION; TAXONOMIC CLASSIFICATION; GENOMIC FRAGMENTS; WEB SERVER; DATABASES; TRAITS; NBC;
D O I
10.1093/bioinformatics/btt077
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Metagenome analysis requires tools that can estimate the taxonomic abundances in anonymous sequence data over the whole range of biological entities. Because there is usually no prior knowledge about the data composition, not only all domains of life but also viruses have to be included in taxonomic profiling. Such a full-range approach, however, is difficult to realize owing to the limited coverage of available reference data. In particular, archaea and viruses are generally not well represented by current genome databases. Results: We introduce a novel approach to taxonomic profiling of metagenomes that is based on mixture model analysis of protein signatures. Our results on simulated and real data reveal the difficulties of the existing methods when measuring achaeal or viral abundances and show the overall good profiling performance of the protein-based mixture model. As an application example, we provide a large-scale analysis of data from the Human Microbiome Project. This demonstrates the utility of our method as a first instance profiling tool for a fast estimate of the community structure.
引用
收藏
页码:973 / 980
页数:8
相关论文
共 30 条
[1]
BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]
PhymmBL expanded: confidence scores, custom databases, parallelization and more [J].
Brady, Arthur ;
Salzberg, Steven .
NATURE METHODS, 2011, 8 (05) :367-367
[3]
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[4]
AN ORDINATION OF THE UPLAND FOREST COMMUNITIES OF SOUTHERN WISCONSIN [J].
BRAY, JR ;
CURTIS, JT .
ECOLOGICAL MONOGRAPHS, 1957, 27 (04) :326-349
[5]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]
TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach [J].
Diaz, Naryttza N. ;
Krause, Lutz ;
Goesmann, Alexander ;
Niehaus, Karsten ;
Nattkemper, Tim W. .
BMC BIOINFORMATICS, 2009, 10
[7]
Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method [J].
Duhaime, Melissa B. ;
Deng, Li ;
Poulos, Bonnie T. ;
Sullivan, Matthew B. .
ENVIRONMENTAL MICROBIOLOGY, 2012, 14 (09) :2526-2537
[8]
HMMER web server: interactive sequence similarity searching [J].
Finn, Robert D. ;
Clements, Jody ;
Eddy, Sean R. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :W29-W37
[9]
The Pfam protein families database [J].
Finn, Robert D. ;
Mistry, Jaina ;
Tate, John ;
Coggill, Penny ;
Heger, Andreas ;
Pollington, Joanne E. ;
Gavin, O. Luke ;
Gunasekaran, Prasad ;
Ceric, Goran ;
Forslund, Kristoffer ;
Holm, Liisa ;
Sonnhammer, Erik L. L. ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D211-D222
[10]
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences [J].
Haque, Monzoorul M. ;
Ghosh, Tarini Shankar ;
Komanduri, Dinakar ;
Mande, Sharmila S. .
BIOINFORMATICS, 2009, 25 (14) :1722-1730