Mining text using keyword distributions

被引:81
作者
Feldman, R [1 ]
Dagan, I
Hirsh, H
机构
[1] Bar Ilan Univ, Dept Math, Ramat Gan, Israel
[2] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[3] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08855 USA
基金
美国国家科学基金会;
关键词
data mining; text mining; text categorization; distribution comparison; trend analysis;
D O I
10.1023/A:1008623632443
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.
引用
收藏
页码:281 / 300
页数:20
相关论文
共 28 条
  • [1] Agrawal R., 1993, P ACM SIGMOD C MAN D, P207
  • [2] ANAND T, 1993, P 1993 WORKSH KNOWL
  • [3] [Anonymous], P 32 ANN M ASS COMP
  • [4] [Anonymous], 1995, P 1 INT C KNOWL DISC
  • [5] APTE C, 1994, P ACM SIGIR C INF RE
  • [6] BRACHMAN R, 1993, INT J INTELLIGENT CO
  • [7] Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
  • [8] CUTTING C, 1993, P ACM SIGIR C INF RE
  • [9] DAGAN I, 1996, P 5 ANN S DOC AN INF
  • [10] EZAWA K, 1995, P 1 INT C KNOWL DISC