Mining text using keyword distributions

被引:81
作者
Feldman, R [1 ]
Dagan, I
Hirsh, H
机构
[1] Bar Ilan Univ, Dept Math, Ramat Gan, Israel
[2] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[3] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08855 USA
基金
美国国家科学基金会;
关键词
data mining; text mining; text categorization; distribution comparison; trend analysis;
D O I
10.1023/A:1008623632443
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.
引用
收藏
页码:281 / 300
页数:20
相关论文
共 28 条
  • [11] FELDMAN R, IN PRESS P 9 INT S M
  • [12] FELDMAN R, 1996, IN PRESS P PAP 96 LO
  • [13] FELDMAN R, 1996, P 4 C APPL NAT LANG
  • [14] Frawley W. J., 1991, Knowledge discovery in databases, P1
  • [15] Han Y., 2021, P420
  • [16] HEARST M, 1995, P ACM SIGCHI C HUM F
  • [17] IWAYAMA M, 1994, P 4 C APPL NAT LANG
  • [18] JACOBS P, 1992, P 3 C APPL NAT LANG
  • [19] PROBLEMS FOR KNOWLEDGE DISCOVERY IN DATABASES AND THEIR TREATMENT IN THE STATISTICS INTERPRETER EXPLORA
    KLOSGEN, W
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1992, 7 (07) : 649 - 673
  • [20] Klosgen W., 1995, Journal of Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies, V4, P53, DOI 10.1007/BF00962822