Capreomycin resistance prediction in two species of Mycobacterium using a stacked ensemble method

被引:19
作者
Chowdhury, A. S. [1 ]
Khaledian, E. [1 ]
Broschat, S. L. [1 ,2 ,3 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, POB 642752, Pullman, WA 99164 USA
[2] Washington State Univ, Paul G Allen Sch Global Anim Hlth, Pullman, WA 99164 USA
[3] Washington State Univ, Dept Vet Microbiol & Pathol, Pullman, WA 99164 USA
关键词
antibiotic resistance; capreomycin resistance; ensemble learning; feature selection; machine learning; physicochemical features; secondary structure features; tuberculosis; ANTIBIOTIC-RESISTANCE; ANTIMICROBIAL RESISTANCE; PROTEIN FOLD; CD-HIT; RECOGNITION; GENERATION; SEQUENCE; DATABASE;
D O I
10.1111/jam.14413
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Aims Predicting bacterial resistance provides valuable information that can assist in clinical decisions. With recent advances in whole genome sequencing technology, the detection of antibiotic resistance (AR) proteins directly from genomic data is becoming feasible. AR genes/proteins can be identified using best-hit methods that work by comparing candidate sequences with known AR genes in public databases. However, these approaches may fail to detect resistance genes with sequences that differ significantly from known sequences. Our goal is to develop a machine learning technique to accurately predict capreomycin resistance in Mycobacteria with low false discovery rates. Methods and Results We present a stacked ensemble learning model as an alternative to traditional DNA sequence alignment-based methods using optimal features generated from the physicochemical, evolutionary and secondary structure properties of protein sequences. We train logistic regression, C5.0 and support vector machine (SVM) algorithms as our base classifiers, and our stacked ensemble predictors combine the results from the base classifiers to achieve higher accuracy. Compared with our most accurate base classifier (SVM), our most accurate stacked ensemble predictor increases training accuracy by 2 center dot 43%. Our stacked ensemble predictors achieve test accuracy up to 81 center dot 25%. Conclusions We developed a stacked ensemble model to predict capreomycin resistance for Mycobacteria with an accuracy >80% using protein sequences with sequence similarity ranging between 10% and 70%. This performance cannot be achieved with best-hit methods due to differences in sequence similarity. Significance and Impact of the Study Today an estimated one-half million cases of multidrug-resistant (MDR) and extensively drug-resistant (XDR) tuberculosis (TB) occur annually worldwide at a great cost. Because capreomycin is a second-line drug used to treat drug-resistant TB, the ability to use a machine learning approach to classify capreomycin-resistant TB in a timely manner is crucial for the successful treatment of MDR or XDR TB.
引用
收藏
页码:1656 / 1664
页数:9
相关论文
共 45 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
The role of antibiotics and antibiotic resistance in nature [J].
Aminov, Rustam I. .
ENVIRONMENTAL MICROBIOLOGY, 2009, 11 (12) :2970-2988
[3]
[Anonymous], 2001, J. Am. Stat. Assoc.
[4]
Balakrishnama S., 1998, Inst. Signal Inf. Process., V18, P1, DOI 10.1109/IJCNN.2000.861335
[5]
Bembom O, 2007, STAT APPL GENET MOL, V6
[6]
Breiman L, 1996, MACH LEARN, V24, P49
[7]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]
SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[9]
Chen M. L., 2018, DEEP LEARNING PREDIC
[10]
LMAC: A Lightweight Message Authentication Code for Wireless Sensor Network [J].
Chowdhury, Amrita Roy ;
DasBit, Sipra .
2015 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2015,