Levelwise search and borders of theories in knowledge discovery

被引:371
作者
Mannila, H [1 ]
Toivonen, H [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
基金
芬兰科学院;
关键词
theory of knowledge discovery; association rules; episodes; integrity constraints; hypergraph transversals;
D O I
10.1023/A:1009796218281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L: deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S subset of or equal to L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
引用
收藏
页码:241 / 258
页数:18
相关论文
共 40 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Agrawal R., 1996, Adv. Knowl. Discov. Data Min, P307
  • [3] [Anonymous], P PYOC ACM SIGMOD IN
  • [4] [Anonymous], 1973, HYPERGRAPHS COMBINAT
  • [5] [Anonymous], 1995, P 1 SIGKDD INT C KNO
  • [6] [Anonymous], P INT C VER LARG DAT
  • [7] BELL S, 1995, P 1 INT C KNOWL DISC, P27
  • [8] BRIN S, 1997, P ACM SIGMOD INT C M, P265
  • [9] Chang C. C., 1990, MODEL THEORY, V73
  • [10] FIRST-ORDER JK-CLAUSAL THEORIES ARE PAC-LEARNABLE
    DERAEDT, L
    DZEROSKI, S
    [J]. ARTIFICIAL INTELLIGENCE, 1994, 70 (1-2) : 375 - 392