A comparison of text classification methods using different stemming techniques

被引:12
作者
Bounabi, Mariem [1 ]
El Moutaouakil, Karim [2 ]
Satori, Khalid [1 ]
机构
[1] USMBA Univ Fes, Comp Sci Imaging & Numer Anal Lab LIIAN, Fez City, Morocco
[2] Mohammed First Univ, Hoceima Natl Sch Appl Sci ENSAH, Al Hoceima, Morocco
关键词
NBMU; SVM; RF; NB; SLogiF; CNB; voting technique; classification; stemmer; term-weighting;
D O I
10.1504/IJCAT.2019.101171
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
In the retrieval of information, two factors have an important impact on the performance of systems: the extract features and the matching process. In this work, we compare three well-known stemming techniques: Lovins stemmer, iterated Lovins and snowball stemmer. Concerning the classification phase, we compare, experimentally, six methods: BNET, NBMU, CNB, RF, SLogicF, and SVM. Basing on this comparison, we propose a new retrieval system by calling the voting method, as a matching tool, to improve the performance of the classical systems. In this paper, we use the TF-IDF algorithm to extract features. The envisaged systems are tested on two databases: BBCNEWS and BBCSPORT. The systems based on Lovins stemmers and on the voting technique give the best results. In fact, for the first databases, the best accuracy observed is for the system Lovins + Vote with a recognition rate of 97%. Concerning the second database, the system snowball + Vote gives us 99% as recognition rate.
引用
收藏
页码:298 / 306
页数:9
相关论文
共 24 条
[1]
Aharrane N, 2015, 2015 INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV)
[2]
[Anonymous], J AM STAT ASS
[3]
[Anonymous], 1950, Human behavior and the principle of least effort
[4]
[Anonymous], THESIS
[5]
[Anonymous], THESIS
[6]
[Anonymous], 1998, LEARNING TEXT CATEGO
[7]
[Anonymous], MACH LEARN MACH LEARN
[8]
[Anonymous], 2005, MACHINE LEARNING
[9]
Robust Face Recognition Using Local Gradient Probabilistic Pattern (LGPP) [J].
Dahmouni, Abdellatif ;
El Moutaouakil, Karim ;
Satori, Khalid .
PROCEEDINGS OF THE MEDITERRANEAN CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGIES 2015, VOL 1, 2016, 380 :277-286
[10]
Dorji TC, 2015, INT J COMPUT APPL T, V52, P150