Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering

被引:8
作者
Zhang, Qiu-Yu [1 ,2 ]
Wang, Peng [1 ]
Yang, Hui-Juan [1 ]
机构
[1] Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou, Gansu, Peoples R China
[2] Technol & Res Ctr Gansu Mfg Informatizat Engn, Lanzhou, Gansu, Peoples R China
关键词
semantic body; lexical chain; semantic similarity; text clustering; spam filter;
D O I
10.4304/jcp.7.11.2612-2616
中图分类号
TP39 [计算机的应用];
学科分类号
081203 [计算机应用技术]; 0835 [软件工程];
摘要
The effect of spam filtering method based on statistics is not good enough in filtering the new-type spam with synonymous substitution and camouflage, because the method based on statistics ignores the semantic relation between words in the text, and only judges from the word itself. So, a method of spam filtering based on the semantic body is proposed in this paper. The method adopts lexical chain based on HowNet and TFIDF method based on statistics to extract e-mail features, and handle spam with text clustering method. The result of the experiment shows that the new method proposed in this pager provides a good effect in filtering new-type spam.
引用
收藏
页码:2612 / 2616
页数:5
相关论文
共 15 条
[1]
[Anonymous], 2011, INVESTIGATION REPORT
[2]
Jian-min XU, 2010, J HEBEI U NATURAL SC, V30, P97
[3]
jun-ning XU, 2010, RES DOCUMENT CLUSTER
[4]
[郎加云 Lang Jiayun], 2010, [计算机系统应用, Computer Systems & Applications], V19, P147
[5]
LIN li, 2007, TEXT CLUSTERING RES
[6]
Liu Q, 2002, 3 CHIN LEX SEM WORKS
[7]
McKeown K.R., 2002, P 2 INT C HUM LANG T, P280
[8]
Ming LIU, 2010, CHINESE J COMPUTERS, V33, P1264
[9]
Morris J, 1991, INFORM PROCESSING MA, V17, P21
[10]
Salton G., 1973, ACM SIGPLAN NOTICES, V10, P48, DOI DOI 10.1145/951787.951766