A Comparison of Text-Classification Techniques Applied to Arabic Text

被引:30
作者
Kanaan, Ghassan [1 ]
Al-Shalabi, Riyad [1 ]
Ghwanmeh, Sameh [2 ]
Al-Ma'adeed, Hamda [1 ]
机构
[1] Arab Acad Banking & Financial Serv, Amman, Jordan
[2] Yarmouk Univ, Dept Comp Engn, Irbid, Jordan
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2009年 / 60卷 / 09期
关键词
D O I
10.1002/asi.20832
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naive Bayes algorithms. The research results reveal that Naive Bayes was the best performer, followed by kNN and Rocchio.
引用
收藏
页码:1836 / 1844
页数:9
相关论文
共 33 条
[1]  
Aljlayl M., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P340, DOI 10.1145/584792.584848
[2]  
[Anonymous], WEIGHT ADJUSTMENT SC
[3]  
[Anonymous], 3 ANN S DOC AN INF R
[4]  
[Anonymous], P 17 ACM INT C RES D
[5]  
[Anonymous], 1994, SIGIR
[6]  
Bergo A., 2001, Text categorization and prototypes
[7]  
COHEN WW, 1996, P 19 ANN INT ACM SIG, P307
[8]  
Guo GD, 2004, LECT NOTES COMPUT SC, V2945, P559
[9]  
HO Y, 1998, P 21 ANN INT ACM SIG, P81
[10]  
Iwayama M, 1994, P 4 C APPL NAT LANG, P162