Investigating microbial co-occurrence patterns based on metagenomic compositional data

被引:78
作者
Ban, Yuguang [1 ]
An, Lingling [2 ,3 ]
Jiang, Hongmei [1 ]
机构
[1] Northwestern Univ, Dept Stat, Evanston, IL 60208 USA
[2] Univ Arizona, Interdisciplinary Program Stat, Tucson, AZ 85721 USA
[3] Univ Arizona, Dept Agr & Biosyst Engn, Tucson, AZ 85721 USA
基金
美国国家科学基金会; 美国国家卫生研究院; 美国食品与农业研究所;
关键词
STABILITY SELECTION; VARIABLE SELECTION; BIOFILM FORMATION; BACTEROIDES; ENVIRONMENT; REGRESSION; BACTERIA; NETWORK;
D O I
10.1093/bioinformatics/btv364
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: The high-throughput sequencing technologies have provided a powerful tool to study the microbial organisms living in various environments. Characterizing microbial interactions can give us insights into how they live and work together as a community. Metagonomic data are usually summarized in a compositional fashion due to varying sampling/sequencing depths from one sample to another. We study the co-occurrence patterns of microbial organisms using their relative abundance information. Analyzing compositional data using conventional correlation methods has been shown prone to bias that leads to artifactual correlations. Results: We propose a novel method, regularized estimation of the basis covariance based on compositional data (REBACCA), to identify significant co-occurrence patterns by finding sparse solutions to a system with a deficient rank. To be specific, we construct the system using log ratios of count or proportion data and solve the system using the l(1)-norm shrinkage method. Our comprehensive simulation studies show that REBACCA (i) achieves higher accuracy in general than the existing methods when a sparse condition is satisfied; (ii) controls the false positives at a pre-specified level, while other methods fail in various cases and (iii) runs considerably faster than the existing comparable method. REBACCA is also applied to several real metagenomic datasets.
引用
收藏
页码:3322 / 3329
页数:8
相关论文
共 24 条
[1]
A NEW APPROACH TO NULL CORRELATIONS OF PROPORTIONS [J].
AITCHISON, J .
JOURNAL OF THE INTERNATIONAL ASSOCIATION FOR MATHEMATICAL GEOLOGY, 1981, 13 (02) :175-189
[2]
[Anonymous], 2010, CSIRO Technical Report EP10994
[3]
SYNERGISTIC EFFECT OF BACTEROIDES, CLOSTRIDIUM, FUSOBACTERIUM, ANAEROBIC COCCI, AND AEROBIC-BACTERIA ON MORTALITY AND INDUCTION OF SUBCUTANEOUS ABSCESSES IN MICE [J].
BROOK, I ;
HUNTER, V ;
WALKER, RI .
JOURNAL OF INFECTIOUS DISEASES, 1984, 149 (06) :924-928
[4]
BRYAN L E, 1979, Antimicrobial Agents and Chemotherapy, V15, P7
[5]
A global network of coexisting microbes from environmental and whole-genome sequence data [J].
Chaffron, Samuel ;
Rehrauer, Hubert ;
Pernthaler, Jakob ;
von Mering, Christian .
GENOME RESEARCH, 2010, 20 (07) :947-959
[6]
VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS [J].
Chen, Jun ;
Li, Hongzhe .
ANNALS OF APPLIED STATISTICS, 2013, 7 (01) :418-442
[7]
Isometric logratio transformations for compositional data analysis [J].
Egozcue, JJ ;
Pawlowsky-Glahn, V ;
Mateu-Figueras, G ;
Barceló-Vidal, C .
MATHEMATICAL GEOLOGY, 2003, 35 (03) :279-300
[8]
Microbial Co-occurrence Relationships in the Human Microbiome [J].
Faust, Karoline ;
Sathirapongsasuti, J. Fah ;
Izard, Jacques ;
Segata, Nicola ;
Gevers, Dirk ;
Raes, Jeroen ;
Huttenhower, Curtis .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (07)
[9]
The large-scale organization of the bacterial network of ecological co-occurrence interactions [J].
Freilich, Shiri ;
Kreimer, Anat ;
Meilijson, Isacc ;
Gophna, Uri ;
Sharan, Roded ;
Ruppin, Eytan .
NUCLEIC ACIDS RESEARCH, 2010, 38 (12) :3857-3868
[10]
Inferring Correlation Networks from Genomic Survey Data [J].
Friedman, Jonathan ;
Alm, Eric J. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (09)