An improved TF-IDF approach for text classification

被引:11
作者
张云涛
龚玲
王永成
机构
[1] China
[2] Shanghai 200030
[3] Network & Information Center School of Electronic & Information Technology
[4] Shanghai Jiaotong University
基金
中国国家自然科学基金;
关键词
Term frequency/inverse document frequency (TF-IDF); Text classification; Confidence; Support; Characteristic words;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.
引用
收藏
页码:50 / 56
页数:7
相关论文
empty
未找到相关数据