Using kNN model for automatic text categorization

被引:76
作者
Guo, GD
Wang, H
Bell, D
Bi, YX
Greer, K
机构
[1] Univ Ulster, Sch Comp & Math, Newtownabbey BT37 0QB, Antrim, North Ireland
[2] Queens Univ Belfast, Sch Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
关键词
kNN model; kNN; Rocchio; text categorization; performance;
D O I
10.1007/s00500-005-0503-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 16 条
  • [1] Context-sensitive learning methods for text categorization
    Cohen, WW
    Singer, Y
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1999, 17 (02) : 141 - 173
  • [2] Approximate statistical tests for comparing supervised classification learning algorithms
    Dietterich, TG
    [J]. NEURAL COMPUTATION, 1998, 10 (07) : 1895 - 1923
  • [3] HAN EH, 2000, 00017 U MINN DEP COM
  • [4] *ICONS, 2001, ICONS CONSORTIUM INT
  • [5] Joachims T., 1996, ICML 97 PROC 14 INT, DOI DOI 10.1016/J.ESWA.2016.09.009
  • [6] Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
  • [7] Joachims T, 2001, P 24 ANN INT ACM SIG, P128, DOI [DOI 10.1145/383952.383974, 10.1145/383952.383974]
  • [8] LAM W, 1998, SIGIR 98, P81
  • [9] Lewis DD., 1998, P 10 EUR C MACH LEAR, V98, P4
  • [10] Text classification using ESC-based stochastic decision lists
    Li, H
    Yamanishi, K
    [J]. PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 122 - 130