Image analysis for efficient categorization of image-based spam e-mail

被引:40
作者
Aradhye, HB [1 ]
Myers, GK [1 ]
Herson, JA [1 ]
机构
[1] SRI Int, Menlo Pk, CA 94025 USA
来源
EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS | 2005年
关键词
D O I
10.1109/ICDAR.2005.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To circumvent prevalent text-based anti-spam filters, spammers have begun embedding the advertisement text in images. Analogously, proprietary information (such as source code) may be communicated as screenshots to defeat text-based monitoring of outbound e-mail. The proposed method separates spam images from other common categories of e-mail images based on extracted overlay text and color features. No expensive OCR processing is necessary. Our method works robustly in spite Of complex backgrounds, compression artifacts, and a wide variety of formats and fonts of overlaid spam text. It is also demonstrated successfully to detect screenshots in outbound e-mail.
引用
收藏
页码:914 / 918
页数:5
相关论文
共 10 条
[1]  
FRANKEL C, 1996, TR9614 U CHIC
[2]  
GAVILAN D, 2003, COMPUT GRAPH FORUM, V22, P427
[3]  
Hu JY, 2004, IEEE MULTIMEDIA, V11, P22, DOI 10.1109/MMUL.2004.1261103
[4]  
Joachims J., 1999, ADV KERNEL METHODS S
[5]  
KANUNGO T, 2001, P 1 INT WORKSH WEB D, P43
[6]   Automatic text detection and tracking in digital video [J].
Li, HP ;
Doermann, D ;
Kia, O .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (01) :147-156
[7]   Automatic text segmentation and text recognition for video indexing [J].
Lienhart, R ;
Effelsberg, W .
MULTIMEDIA SYSTEMS, 2000, 8 (01) :69-81
[8]  
SATO T, 1998, IEEE WORKSH CONT BAS
[9]  
Szummer M, 1998, 1998 IEEE INTERNATIONAL WORKSHOP ON CONTENT-BASED ACCESS OF IMAGE AND VIDEO DATABASE, PROCEEDINGS, P42, DOI 10.1109/CAIVD.1998.646032
[10]   SIMPLIcity: Semantics-sensitive integrated matching for picture libraries [J].
Wang, JZ ;
Li, J ;
Wiederhold, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (09) :947-963