Improving Text Classification Performance Using PCA and Recall-Precision Criteria

被引:14
作者
Zahedi, M. [1 ]
Sorkhi, A. Ghanbari [1 ]
机构
[1] Shahrood Univ Technol, Shahrood, Iran
关键词
Text classification; Term frequency and category relevancy factor; Principle component analysis; Recall and precision criteria; NEAREST-NEIGHBOR;
D O I
10.1007/s13369-013-0569-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Persian text is usually associated with a wide range of important or useless features. This is the main reason why feature extraction process is one of the difficult tasks in the field of Persian text analysis and understanding. While few research works have focused on this problem, the aim of this paper is to introduce a novel approach for extracting the most relevant features and classification of Persian text. Experimental results show that utilizing the principle component analysis along with recall and precision criteria and employing term frequency and category relevancy factor can result in considerable improvement in running time of the classification process while accuracy and precision criteria are improved a little or are not decreased as much as affecting classification performance.
引用
收藏
页码:2095 / 2102
页数:8
相关论文
共 33 条
  • [1] ALEAHMAD A, 2007, INT S SIGN PROC ITS
  • [2] AMIRI H, 2007, 2 WORKSH COMP APPR A
  • [3] [Anonymous], 1997, ICML
  • [4] [Anonymous], IEEE T NEURAL NETW
  • [5] [Anonymous], 2008, Introduction to information retrieval
  • [6] [Anonymous], 1995, P 4 ANN S DOCUMENT A
  • [7] Basiri M.A, 2009, 13 INT CSI COMP C CS
  • [8] BINA B, 2008, 13 INT CSI COMP C CS
  • [9] Cohen W. W., 1996, SIGIR Forum, P307
  • [10] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +