Feature sub-set selection metrics for Arabic text classification

被引:49
作者
Mesleh, Abdelwadood Moh'd [1 ]
机构
[1] Al Blaqa Appl Univ, Fac Engn Technol, Dept Comp Engn, Amman, Jordan
关键词
Feature selection; SVM; Arabic text classification; MODEL; KNN;
D O I
10.1016/j.patrec.2011.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F-1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1922 / 1929
页数:8
相关论文
共 50 条
  • [31] Manning C., 1999, Foundations of Statistical Natural Language Processing
  • [32] MCCALLUM A, 1998, 15 NAT C ART INT AAA
  • [33] MESLEH A, 2010, SUPPORT VECTOR MACHI
  • [34] Mesleh Abdelwadood Moh'd, 2008, 2008 International Conference on Computer Engineering & Systems (ICCES '08), P143, DOI 10.1109/ICCES.2008.4772984
  • [35] Support Vector Machines Based Arabic Language Text Classification System: Feature Selection Comparative Study
    Mesleh, Abdelwadood Moh'd
    [J]. ADVANCES IN COMPUTER AND INFORMATIOM SCIENCES AND ENGINEERING, 2008, : 11 - 16
  • [36] Mladenic D, 1999, MACHINE LEARNING, PROCEEDINGS, P258
  • [37] MOEN M, 2006, INFORM EXTRACTION AL
  • [38] Feature selection, perceptron learning, and a usability case study for text categorization
    Ng, HT
    Goh, WB
    Low, KL
    [J]. PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1997, : 67 - 73
  • [39] Feature selection with a measure of deviations from Poisson in text categorization
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6826 - 6832
  • [40] VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING
    SALTON, G
    WONG, A
    YANG, CS
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (11) : 613 - 620