Bayesian additive regression trees-based spam detection for enhanced email privacy

被引:10
作者
Abu-Nimeh, Saeed [1 ]
Nappa, Dario [1 ]
Wang, Xinlei [1 ]
Nair, Suku [1 ]
机构
[1] So Methodist Univ, SMU HACNet Lab, Dallas, TX 75275 USA
来源
ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY | 2008年
关键词
BART; CART; classification; logistic regression; NNet; random forests; spam; SVM;
D O I
10.1109/ARES.2008.136
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Spam is considered an invasion of privacy. Its changeable structures and variability raise the need for new spam classification techniques. The present study proposes using Bayesian Additive Regression Trees (BART) for spam classification and evaluates its performance against other classification methods, including Logistic Regression, Support Vector Machines, Classification and Regression Trees, Neural Networks, Random Forests, and Naive Bayes. BART in its original form is not designed for such problems, hence we modify BART and make it applicable to classification problems. We evaluate the classifiers using three spam datasets; Ling-Spam, PU1, and Spambase to determine the predictive accuracy and the false positive rate.
引用
收藏
页码:1044 / 1051
页数:8
相关论文
共 13 条
[1]  
ABUNIMEH S, 2007, ECRIME 07, P60
[2]   BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA [J].
ALBERT, JH ;
CHIB, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) :669-679
[3]  
Androutsopoulos I., 2000, P WORKSH MACH LEARN
[4]  
[Anonymous], 1998, Learning for Text Categorization
[5]  
[Anonymous], ACM T ASIAN LANGUAGE, DOI DOI 10.1145/1039621.1039625
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Breiman L., 1984, CLASSIFICATION REGRE, DOI [10.1201/9781315139470, DOI 10.1201/9781315139470]
[8]  
Chipman H.A., 2006, BART BAYESIAN ADDITI
[9]   Bayesian CART model search [J].
Chipman, HA ;
George, EI ;
McCulloch, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :935-948
[10]  
Harrell FE., 2001, REGRESSION MODELING, DOI DOI 10.1007/978-3-319-19425-7