Classification of genomic islands using decision trees and their ensemble algorithms

被引:15
作者
Che, Dongsheng [1 ]
Hockenbury, Cory [1 ]
Marmelstein, Robert [1 ]
Rasheed, Khaled [2 ]
机构
[1] E Stroudsburg Univ, Dept Comp Sci, E Stroudsburg, PA 18301 USA
[2] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
来源
BMC GENOMICS | 2010年 / 11卷
关键词
HORIZONTAL GENE-TRANSFER; PATHOGENICITY ISLANDS; BACTERIAL GENOMES; EVOLUTION;
D O I
10.1186/1471-2164-11-S2-S1
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory. Results: In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy. Conclusions: We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
引用
收藏
页数:9
相关论文
共 33 条
[1]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[2]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[3]  
Che DS, 2007, 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P135
[4]   A ROLE FOR BACTERIOPHAGES IN THE EVOLUTION AND TRANSFER OF BACTERIAL VIRULENCE DETERMINANTS [J].
CHEETHAM, BF ;
KATZ, ME .
MOLECULAR MICROBIOLOGY, 1995, 18 (02) :201-208
[5]   Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops [J].
Chiapello, H ;
Bourgait, I ;
Sourivong, F ;
Heuclin, G ;
Gendrault-Jacquemard, A ;
Petit, MA ;
El Karoui, M .
BMC BIOINFORMATICS, 2005, 6 (1)
[6]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[7]  
Freund Y., 1995, Journal of computer and system sciences, P23, DOI [DOI 10.1007/3-540-59119-2_166, 10.1007/3-540-59119-2_166]
[8]  
Friedman J., 1998, ADDITIVE LOGISTIC RE
[9]   Pathogenicity islands: a molecular toolbox for bacterial virulence [J].
Gal-Mor, Ohad ;
Finlay, B. Brett .
CELLULAR MICROBIOLOGY, 2006, 8 (11) :1707-1719
[10]   Horizontal gene transfer in bacterial and archaeal complete genomes [J].
Garcia-Vallvé, S ;
Romeu, A ;
Palau, J .
GENOME RESEARCH, 2000, 10 (11) :1719-1725