Support vector machines for spam categorization

被引：785

作者：

Drucker, H ^{[1
]}

Wu, DH

Vapnik, VN

机构：

[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA

[2] Monmouth Univ, Dept Elect Engn, W Long Branch, NJ 07764 USA

[3] Rensselaer Polytech Inst, Troy, NY 12181 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS | 1999年 / 10卷 / 05期

关键词：

boosting algorithms; classification; e-mail; feature representation; Ripper; Rocchio; support vector machines;

D O I：

10.1109/72.788645

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the use of support vector machines (SVM's) In classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000, SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.

引用

页码：1048 / 1054

页数：7

共 18 条

[1] [Anonymous], 1998, EUR C MACH LEARN
[2] Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[3] COHEN WW, P 1996 AAAI SPRING S
[4] COHEN WW, 1986, P 19 ANN INT ACM SIG, P307
[5] Spam!
Cranor, LF
LaMacchia, BA
[J]. COMMUNICATIONS OF THE ACM, 1998, 41 (08) : 74 - 83
[6] Drucker H, 1997, ADV NEUR IN, V9, P155
[7] Freund Y., 1996, Proceedings of the Ninth Annual Conference on Computational Learning Theory, P325, DOI 10.1145/238061.238163
[8] Freund Y., 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P148
[9] JOACHIMS T, 1997, P 14 INT C MACH LEAR
[10] LEWIS DD, 1996, P 19 ANN INT ACM SIG, P298

← 1 2 →