Keyword Extraction Based on tf/idf for Chinese News Document

被引:25
作者
LI Juanzi
机构
基金
中国国家自然科学基金;
关键词
keyword extraction; keyphrase extraction; news keyword;
D O I
暂无
中图分类号
TP391.1 [文字信息处理]; TP182 [专家系统、知识工程];
学科分类号
081203 ; 0835 ; 1111 ;
摘要
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of key- words in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new key- word extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, bi-and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved.
引用
收藏
页码:917 / 921
页数:5
相关论文
共 1 条
  • [1] Thesaurus-Based Index Term Extraction for Agricultural Documents. Medelyan O,Witten A I H. Proc of the 6th Ag- ricultural Ontology Service (AOS) Workshop at EFITA/WCCA . 2005