Feature sub-set selection metrics for Arabic text classification

被引:49
作者
Mesleh, Abdelwadood Moh'd [1 ]
机构
[1] Al Blaqa Appl Univ, Fac Engn Technol, Dept Comp Engn, Amman, Jordan
关键词
Feature selection; SVM; Arabic text classification; MODEL; KNN;
D O I
10.1016/j.patrec.2011.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F-1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1922 / 1929
页数:8
相关论文
共 50 条
  • [41] SAWAF H, 2001, AR NAT LANG PROC WOR
  • [42] Machine learning in automated text categorization
    Sebastiani, F
    [J]. ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
  • [43] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [44] Syiam M., 2006, INT J INTELLIGENT CO, V6, P1
  • [45] YAHYA A, 1989, 1 GREAT LAK COMP SCI
  • [46] YANG Y, 2003, 26 ACM SIGIR C RES D, P96
  • [47] Yang YM, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P42, DOI 10.1145/312624.312647
  • [48] Yiming Y, 1997, P 14 INT C MACH LEAR, V97, P412, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [49] ZHANG YX, 2006, THESIS U MIAMI
  • [50] Zheng ZH, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P705