Bayesian estimation of bacterial community composition from 454 sequencing data

被引:21
作者
Cheng, Lu [1 ]
Walker, Alan W. [2 ]
Corander, Jukka [1 ]
机构
[1] Univ Helsinki, Dept Math & Stat, FIN-00014 Helsinki, Finland
[2] Wellcome Trust Sanger Inst, Pathogen Genom Grp, Hinxton CB10 1SA, Cambs, England
基金
欧洲研究理事会;
关键词
16S RIBOSOMAL-RNA; ALIGNMENT; ACCURACY; MUSCLE;
D O I
10.1093/nar/gks227
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Estimating bacterial community composition from a mixed sample in different applied contexts is an important task for many microbiologists. The bacterial community composition is commonly estimated by clustering polymerase chain reaction amplified 16S rRNA gene sequences. Current taxonomy-independent clustering methods for analyzing these sequences, such as UCLUST, ESPRIT-Tree and CROP, have two limitations: (i) expert knowledge is needed, i.e. a difference cutoff between species needs to be specified; (ii) closely related species cannot be separated. The first limitation imposes a burden on the user, since considerable effort is needed to select appropriate parameters, whereas the second limitation leads to an inaccurate description of the underlying bacterial community composition. We propose a probabilistic model-based method to estimate bacterial community composition which tackles these limitations. Our method requires very little expert knowledge, where only the possible maximum number of clusters needs to be specified. Also our method demonstrates its ability to separate closely related species in two experiments, in spite of sequencing errors and individual variations.
引用
收藏
页码:5240 / 5249
页数:10
相关论文
共 24 条
[1]  
Barbara D., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P582, DOI 10.1145/584792.584888
[2]  
Bernardo J. M., 2009, BAYESIAN THEORY, V405
[3]  
Bonnet R, 2002, INT J SYST EVOL MICR, V52, P757, DOI [10.1099/ijs.0.01755-0, 10.1099/00207713-52-3-757]
[4]   ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time [J].
Cai, Yunpeng ;
Sun, Yijun .
NUCLEIC ACIDS RESEARCH, 2011, 39 (14) :e95
[5]   Bayesian semi-supervised classification of bacterial samples using MLST databases [J].
Cheng, Lu ;
Connor, Thomas R. ;
Aanensen, David M. ;
Spratt, Brian G. ;
Corander, Jukka .
BMC BIOINFORMATICS, 2011, 12
[6]   Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy [J].
Corander J. ;
Gyllenberg M. ;
Koski T. .
Adv. Data Anal. Classif., 2009, 1 (3-24) :3-24
[7]   Bayesian analysis of population structure based on linked molecular information [J].
Corander, Jukka ;
Tang, Jing .
MATHEMATICAL BIOSCIENCES, 2007, 205 (01) :19-31
[8]   Bayesian identification of admixture events using multilocus molecular markers [J].
Corander, Jukka ;
Marttinen, Pekka .
MOLECULAR ECOLOGY, 2006, 15 (10) :2833-2843
[9]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[10]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797