From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

被引:14
作者
Slabbinck, Bram [1 ,2 ]
Waegeman, Willem [2 ]
Dawyndt, Peter [3 ]
De Vos, Paul [1 ,4 ]
De Baets, Bernard [2 ]
机构
[1] Univ Ghent, Microbiol Lab, B-9000 Ghent, Belgium
[2] Univ Ghent, KEMIT, Dept Appl Math Biometr & Proc Control, B-9000 Ghent, Belgium
[3] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
[4] BCCM TM LMG Bacteria Collect, B-9000 Ghent, Belgium
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
AD-HOC-COMMITTEE; SPECIES DEFINITION; IDENTIFICATION; TREE; TOOL;
D O I
10.1186/1471-2105-11-69
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results: In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions: FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
引用
收藏
页数:16
相关论文
共 50 条
[31]  
McCallum A., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P359
[32]   Accurate phylogenetic classification of variable-length DNA fragments [J].
McHardy, Alice Carolyn ;
Garcia Martin, Hector ;
Tsirigos, Aristotelis ;
Hugenholtz, Philip ;
Rigoutsos, Isidore .
NATURE METHODS, 2007, 4 (01) :63-72
[33]  
Mirkin B., 2005, CLUSTERING DATA MINI, DOI [DOI 10.1201/9781420034912, 10.1198/jasa.2006.s109]
[34]  
Park Jisook, 2007, [The Korean Journal of Consumer and Advertising Psychology, 한국심리학회지: 소비자·광고], V8, P1
[35]   SILVA:: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB [J].
Pruesse, Elmar ;
Quast, Christian ;
Knittel, Katrin ;
Fuchs, Bernhard M. ;
Ludwig, Wolfgang ;
Peplies, Joerg ;
Gloeckner, Frank Oliver .
NUCLEIC ACIDS RESEARCH, 2007, 35 (21) :7188-7196
[36]  
Rousu J, 2006, J MACH LEARN RES, V7, P1601
[37]   THE NEIGHBOR-JOINING METHOD - A NEW METHOD FOR RECONSTRUCTING PHYLOGENETIC TREES [J].
SAITOU, N ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1987, 4 (04) :406-425
[38]   TaxonGap: a visualization tool for intra- and inter-species variation among individual biomarkers [J].
Slabbinck, B. ;
Dawyndt, P. ;
Martens, M. ;
De Vos, P. ;
De Baets, B. .
BIOINFORMATICS, 2008, 24 (06) :866-867
[39]  
SLABBINCK B, FAME BANK NET PUBLIC
[40]   Genus-wide Bacillus species identification through proper artificial neural network experiments on fatty acid profiles [J].
Slabbinck, Bram ;
De Baets, Bernard ;
Dawyndt, Peter ;
De Vos, Paul .
ANTONIE VAN LEEUWENHOEK INTERNATIONAL JOURNAL OF GENERAL AND MOLECULAR MICROBIOLOGY, 2008, 94 (02) :187-198