Feature sub-set selection metrics for Arabic text classification

被引:49
作者
Mesleh, Abdelwadood Moh'd [1 ]
机构
[1] Al Blaqa Appl Univ, Fac Engn Technol, Dept Comp Engn, Amman, Jordan
关键词
Feature selection; SVM; Arabic text classification; MODEL; KNN;
D O I
10.1016/j.patrec.2011.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F-1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1922 / 1929
页数:8
相关论文
共 50 条
  • [1] Al-Fedaghi SabahS., 1989, Proceedings of the 11th National Computer Conference, King Fahd University of Petroleum Minerals, Dhahran, Saudi Arabia, P04
  • [2] Al-Harbi S., 2008, P 9 INT C STAT AN TE, V8, P77
  • [3] ALJLAYL M, 2001, 10 TEXT RETR C GAITH, P265
  • [4] ALSHALABI R, 2006, P 4 INT MULT COMP SC, V4
  • [5] [Anonymous], J COMPUTER SCI, DOI DOI 10.3844/JCSSP.2023.20.56
  • [6] Baeza-Yates R.A., 1999, Modern Information Retrieval
  • [7] Bawaneh M.J., 2008, J COMPUTER SCI, V4, P600
  • [8] BEESLEY K, 1996, P COLING 96, V1, P89
  • [9] Bishop CM., 1995, NEURAL NETWORKS PATT
  • [10] Cavalli-Sforza V, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, pA86