Joint Analysis of Multiple Metagenomic Samples

被引:15
作者
Baran, Yael [1 ]
Halperin, Eran [1 ,2 ,3 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Mol Microbiol & Biotechnol, IL-69978 Tel Aviv, Israel
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
基金
以色列科学基金会;
关键词
SEQUENCES; OBJECTS; ROBUST;
D O I
10.1371/journal.pcbi.1002373
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.
引用
收藏
页数:11
相关论文
共 31 条
[1]   Enterotypes of the human gut microbiome [J].
Arumugam, Manimozhiyan ;
Raes, Jeroen ;
Pelletier, Eric ;
Le Paslier, Denis ;
Yamada, Takuji ;
Mende, Daniel R. ;
Fernandes, Gabriel R. ;
Tap, Julien ;
Bruls, Thomas ;
Batto, Jean-Michel ;
Bertalan, Marcelo ;
Borruel, Natalia ;
Casellas, Francesc ;
Fernandez, Leyden ;
Gautier, Laurent ;
Hansen, Torben ;
Hattori, Masahira ;
Hayashi, Tetsuya ;
Kleerebezem, Michiel ;
Kurokawa, Ken ;
Leclerc, Marion ;
Levenez, Florence ;
Manichanh, Chaysavanh ;
Nielsen, H. Bjorn ;
Nielsen, Trine ;
Pons, Nicolas ;
Poulain, Julie ;
Qin, Junjie ;
Sicheritz-Ponten, Thomas ;
Tims, Sebastian ;
Torrents, David ;
Ugarte, Edgardo ;
Zoetendal, Erwin G. ;
Wang, Jun ;
Guarner, Francisco ;
Pedersen, Oluf ;
de Vos, Willem M. ;
Brunak, Soren ;
Dore, Joel ;
Weissenbach, Jean ;
Ehrlich, S. Dusko ;
Bork, Peer .
NATURE, 2011, 473 (7346) :174-180
[2]  
Brants T., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P211, DOI 10.1145/584792.584829
[3]   Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing [J].
Chan, Chon-Kit Kenneth ;
Hsu, Arthur L. ;
Tang, Sen-Lin ;
Halgamuge, Saman K. .
JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2008,
[4]  
Chatterji S, 2008, LECT N BIOINFORMAT, V4955, P17
[5]  
Chiang MMT, 2007, LECT NOTES ARTIF INT, V4874, P395
[6]  
Cohn D, 2001, ADV NEUR IN, V13, P430
[7]  
Cohn D., 2000, ICML, P167
[8]   Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex [J].
Hamady, Micah ;
Walker, Jeffrey J. ;
Harris, J. Kirk ;
Gold, Nicholas J. ;
Knight, Rob .
NATURE METHODS, 2008, 5 (03) :235-237
[9]  
Hartigan J.A, 1975, CLUSTERING ALGORITHM
[10]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57