Feature sub-set selection metrics for Arabic text classification

被引:49
作者
Mesleh, Abdelwadood Moh'd [1 ]
机构
[1] Al Blaqa Appl Univ, Fac Engn Technol, Dept Comp Engn, Amman, Jordan
关键词
Feature selection; SVM; Arabic text classification; MODEL; KNN;
D O I
10.1016/j.patrec.2011.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F-1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1922 / 1929
页数:8
相关论文
共 50 条
  • [21] Guo GD, 2004, LECT NOTES COMPUT SC, V2945, P559
  • [22] On machine learning methods for Chinese document categorization
    He, J
    Tan, AH
    Tan, CL
    [J]. APPLIED INTELLIGENCE, 2003, 18 (03) : 311 - 322
  • [23] Performance of KNN and SVM classifiers on full word Arabic articles
    Hmeidi, Ismail
    Hawashin, Bilal
    El-Qawasmeh, Eyas
    [J]. ADVANCED ENGINEERING INFORMATICS, 2008, 22 (01) : 106 - 111
  • [24] Joachims T., EUR C MACH LEARN, P137, DOI DOI 10.1007/BFB0026683
  • [25] KANAAN G, 2006, P 4 INT MULT COMP SC, V4
  • [26] A Comparison of Text-Classification Techniques Applied to Arabic Text
    Kanaan, Ghassan
    Al-Shalabi, Riyad
    Ghwanmeh, Sameh
    Al-Ma'adeed, Hamda
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (09): : 1836 - 1844
  • [27] Khoja S., 2001, P STUDENT WORKSHOP N, P20
  • [28] A machine learning approach for Arabic text classification using N-gram frequency statistics
    Khreisat, Laila
    [J]. JOURNAL OF INFORMETRICS, 2009, 3 (01) : 72 - 77
  • [29] Toward integrating feature selection algorithms for classification and clustering
    Liu, H
    Yu, L
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (04) : 491 - 502
  • [30] Liu H, 2008, CH CRC DATA MIN KNOW, P3