An improved TF-IDF approach for text classification

被引:9
作者
Zhang Yun-tao
Gong Ling
Wang Yong-cheng
机构
[1] Shanghai Jiaotong University,Network & Information Center
[2] Shanghai Jiaotong University,School of Electronic & Information Technology
来源
Journal of Zhejiang University-SCIENCE A | 2005年 / 6卷 / 1期
关键词
Term frequency/inverse document frequency (TF-IDF); Text classification; Confidence; Support; Characteristic words; A; TP31;
D O I
10.1631/BF02842477
中图分类号
学科分类号
摘要
This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.
引用
收藏
页码:49 / 55
页数:6
相关论文
共 19 条
[1]  
Fabrizio S.(2002)Machine learning in automated text categorization ACM Computing Surveys 34 1-47
[2]  
Fan Y.(2001)Using naïve bayes to coordinate the classification of web pages Journal of Software 12 1386-1392
[3]  
Zheng C.(1998)SVM based classification system Pattern Recognition and Artificial Intelligence 11 147-153
[4]  
Wang Q. Y.(2001)One-class SVMs for document classification Journal of Machine Learning Research 2 139-154
[5]  
Cai Q. S.(2000)Chinese text visualization Journal of Northeastern University 21 501-504
[6]  
Liu J.(1991)Developments in automatic text retrieval Science 253 974-979
[7]  
Huang X. J.(1988)Term weighting approaches in automatic text retrieval Information Processing and Management 24 513-523
[8]  
Wu L. D.(2001)Automatic text categorization based on Journal of Beijing University of Posts & Telecomms 24 42-46
[9]  
Larry M. M.(undefined)-nearest neighbor undefined undefined undefined-undefined
[10]  
Malik Y.(undefined)undefined undefined undefined undefined-undefined