A review of machine learning approaches to Spam filtering

被引：307

作者：

Guzella, Thiago S. ^{[1
]}

Caminhas, Walmir M. ^{[1
]}

机构：

[1] Univ Fed Minas Gerais, Dept Elect Engn, BR-31270910 Belo Horizonte, MG, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2009年 / 36卷 / 07期

关键词：

Spam filtering; Online learning; Bag-of-words (BoW); Naive Bayes; Image spam; ARTIFICIAL IMMUNE-SYSTEM; SUPPORT VECTOR MACHINES; FEATURE-SELECTION; CONCEPT DRIFT; CLASSIFICATION; EXTRACTION; GENERATION; KNOWLEDGE; MESSAGES; MODELS;

D O I：

10.1016/j.eswa.2009.02.037

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings. (C) 2009 Elsevier Ltd. All rights reserved.

引用

页码：10206 / 10222

页数：17

共 124 条

[71] Binary LNS-based nalive Bayes inference engine for spam control: noise analysis and FPGA implementation [J].

Marsono, M. N. ;

El-Kharashi, M. Watheq ;

Gebali, F. .

IET COMPUTERS AND DIGITAL TECHNIQUES, 2008, 2 (01) :56-62

[72] Competing for consumer's attention [J].

Martin-Herran, Guiomar ;

Rubel, Olivier ;

Zaccour, Georges .

AUTOMATICA, 2008, 44 (02) :361-370

[73]

MEDLOCK B, 2006, P 3 C EM ANT

[74] Managing irrelevant knowledge in CBR models for unsolicited e-mail classification [J].

Mendez, J. R. ;

Glez-Pena, D. ;

Fdez-Riverola, F. ;

Diaz, F. ;

Corchado, J. M. .

EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :1601-1614

[75]

METSIS V, 2006, P C EM ANT

[76]

Oda T, 2005, LECT NOTES COMPUT SC, V3627, P276

[77]

Oda T, 2003, LECT NOTES COMPUT SC, V2723, P231

[78]

ODA T, 2003, P IEEE C EV COMP, V1

[79] Adaptive anti-spam filtering for agglutinative languages:: a special case for Turkish [J].

Özgür, L ;

Güngör, T ;

Gürgen, F .

PATTERN RECOGNITION LETTERS, 2004, 25 (16) :1819-1831

[80] A suffix tree approach to anti-spam email filtering [J].

Pampapathi, Rajesh ;

Mirkin, Boris ;

Levene, Mark .

MACHINE LEARNING, 2006, 65 (01) :309-338

← 3 4 5 6 7 8 9 10 11 12 →