phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

被引:14310
作者
McMurdie, Paul J. [1 ]
Holmes, Susan [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
CD-HIT; DIVERSITY; SEQUENCE; SERVER; BIOINFORMATICS; METAGENOMICS; COMMUNITIES; RESOURCE; TOOLS;
D O I
10.1371/journal.pone.0061217
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Background: The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results: Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions: The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
引用
收藏
页数:11
相关论文
共 87 条
[1]
Allaire J, 2013, MARKDOWN PACKAGE MAR
[2]
Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[3]
Multivariate dispersion as a measure of beta diversity [J].
Anderson, MJ ;
Ellingsen, KE ;
McArdle, BH .
ECOLOGY LETTERS, 2006, 9 (06) :683-693
[4]
CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[5]
[Anonymous], 2011, R LANG ENV STAT COMP
[6]
[Anonymous], 1993, Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment
[7]
[Anonymous], 2010, MULTTEST RESAMPLING
[8]
[Anonymous], 2011, 8 ANN BIOT BIOINF S
[9]
[Anonymous], 1984, Theory and Application of Correspondence Analysis
[10]
[Anonymous], GGPLOT2 ELEGANT GRAP