Feature sub-set selection metrics for Arabic text classification

被引：49

作者：

Mesleh, Abdelwadood Moh'd ^{[1
]}

机构：

[1] Al Blaqa Appl Univ, Fac Engn Technol, Dept Comp Engn, Amman, Jordan

来源：

PATTERN RECOGNITION LETTERS | 2011年 / 32卷 / 14期

关键词：

Feature selection; SVM; Arabic text classification; MODEL; KNN;

D O I：

10.1016/j.patrec.2011.07.010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F-1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks. (C) 2011 Elsevier B.V. All rights reserved.

引用

页码：1922 / 1929

页数：8

共 50 条

[41] SAWAF H, 2001, AR NAT LANG PROC WOR
[42] Machine learning in automated text categorization
Sebastiani, F
[J]. ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
[43] A novel feature selection algorithm for text categorization
Shang, Wenqian
Huang, Houkuan
Zhu, Haibin
Lin, Yongmin
Qu, Youli
Wang, Zhihai
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
[44] Syiam M., 2006, INT J INTELLIGENT CO, V6, P1
[45] YAHYA A, 1989, 1 GREAT LAK COMP SCI
[46] YANG Y, 2003, 26 ACM SIGIR C RES D, P96
[47] Yang YM, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P42, DOI 10.1145/312624.312647
[48] Yiming Y, 1997, P 14 INT C MACH LEAR, V97, P412, DOI DOI 10.1016/J.ESWA.2008.05.026
[49] ZHANG YX, 2006, THESIS U MIAMI
[50] Zheng ZH, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P705

← 1 2 3 4 5 →