Support vector machines for spam categorization

被引:785
作者
Drucker, H [1 ]
Wu, DH
Vapnik, VN
机构
[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA
[2] Monmouth Univ, Dept Elect Engn, W Long Branch, NJ 07764 USA
[3] Rensselaer Polytech Inst, Troy, NY 12181 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1999年 / 10卷 / 05期
关键词
boosting algorithms; classification; e-mail; feature representation; Ripper; Rocchio; support vector machines;
D O I
10.1109/72.788645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the use of support vector machines (SVM's) In classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000, SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.
引用
收藏
页码:1048 / 1054
页数:7
相关论文
共 18 条
  • [1] [Anonymous], 1998, EUR C MACH LEARN
  • [2] Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
  • [3] COHEN WW, P 1996 AAAI SPRING S
  • [4] COHEN WW, 1986, P 19 ANN INT ACM SIG, P307
  • [5] Spam!
    Cranor, LF
    LaMacchia, BA
    [J]. COMMUNICATIONS OF THE ACM, 1998, 41 (08) : 74 - 83
  • [6] Drucker H, 1997, ADV NEUR IN, V9, P155
  • [7] Freund Y., 1996, Proceedings of the Ninth Annual Conference on Computational Learning Theory, P325, DOI 10.1145/238061.238163
  • [8] Freund Y., 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P148
  • [9] JOACHIMS T, 1997, P 14 INT C MACH LEAR
  • [10] LEWIS DD, 1996, P 19 ANN INT ACM SIG, P298