Performance of KNN and SVM classifiers on full word Arabic articles

被引:76
作者
Hmeidi, Ismail [1 ]
Hawashin, Bilal [1 ]
El-Qawasmeh, Eyas [1 ]
机构
[1] Jordan Univ Sci & Technol, Fac Comp & Informat Technol, Irbid 22110, Jordan
关键词
Arabic text categorization; full word features; tf.idf weighting; CHI statistics; KNN; SVM;
D O I
10.1016/j.aei.2007.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper reports a comparative study of two machine learning methods on Arabic text categorization. Based on a collection of news articles as a training set, and another set of news articles as a testing set, we evaluated K nearest neighbor (KNN) algorithm, and support vector machines (SVM) algorithm. We used the full word features and considered the tf.idf as the weighting method for feature selection, and CHI statistics as a ranking metric. Experiments showed that both methods were of superior performance on the test corpus while SVM showed a better micro average F1 and prediction time. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:106 / 111
页数:6
相关论文
共 22 条
  • [1] [Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [2] APTE C, 1998, P C AUT LEARN DISC C
  • [3] Dasarathy B.V., 1991, IEEE COMPUTER SOC TU
  • [4] Duwairi RM, 2005, DMIN '05: Proceedings of the 2005 International Conference on Data Mining, P187
  • [5] El-Halees A., 2007, ISLAMIC U J, V15, P157
  • [6] FUHR N, 1991, P RIAO 91, P606
  • [7] JI H, 2000, P INT WORKSH TEXT WE, P24
  • [8] Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169
  • [9] Joachims T., 1996, ICML 97 PROC 14 INT, DOI DOI 10.1016/J.ESWA.2016.09.009
  • [10] Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683