Towards large-scale FAME-based bacterial species identification using machine learning techniques

被引:33
作者
Slabbinck, Bram [1 ]
De Baets, Bernard [1 ]
Dawyndt, Peter [2 ]
De Vos, Paul [3 ]
机构
[1] Univ Ghent, Res Unit Knowledge Based Syst, Fac Biosci Engn, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
[3] Univ Ghent, Microbiol Lab, BCCM LMG Bacteria Collect, B-9000 Ghent, Belgium
关键词
Bacillus; Bacteria; Fatty acid methyl ester; Gas chromatography; Identification; Machine learning; Paenibacillus; Pseudomonas; Species; Taxonomy; FATTY-ACID-COMPOSITION; GAS-LIQUID-CHROMATOGRAPHY; CLASSIFICATION; PROFILES; MICROORGANISMS; SYSTEM; TOOL;
D O I
10.1016/j.syapm.2009.01.003
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models Outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy. (C) 2009 Elsevier GmbH. All rights reserved.
引用
收藏
页码:163 / 176
页数:14
相关论文
共 52 条
[1]   CLASSIFICATION OF MICROORGANISMS BY ANALYSIS OF CHEMICAL COMPOSITION .1. FEASIBILITY OF UTILIZING GAS CHROMATOGRAPHY [J].
ABEL, K ;
DESCHMER.H ;
PETERSON, JI .
JOURNAL OF BACTERIOLOGY, 1963, 85 (05) :1039-&
[2]  
[Anonymous], 2001, Pattern Classification
[3]  
ASH C, 1993, ANTON LEEUW INT J G, V64, P253
[4]  
Berkeley R.C. W., 2002, Applications and systematics of bacillus and relatives
[5]   Automated systems for identification of heterotrophic marine bacteria on the basis of their fatty acid composition [J].
Bertone, S ;
Giacomini, M ;
Ruggiero, C ;
Piccarolo, C ;
Calegari, L .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 1996, 62 (06) :2122-2132
[6]  
Bishop C., 2006, BOOK REV PATTERNRECO, DOI DOI 10.1117/1.2819119
[7]  
Bishop Christopher M, 1995, Neural networks for pattern recognition
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Characterization of spirochetal isolates from arthropods collected in South Moravia, Czech Republic, using fatty acid methyl esters analysis [J].
Cechová, L ;
Durnová, E ;
Sikutová, S ;
Halouzka, J ;
Nemec, M .
JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2004, 808 (02) :249-254
[10]  
Chang C.-C., LIBSVM: a Library for Support Vector Machines