Detecting group differences: Mining contrast sets

被引:218
作者
Bay, SD [1 ]
Pazzani, MJ [1 ]
机构
[1] Univ Calif Irvine, Dept Informat & Comp Sci, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
data mining; contrast sets; change detection; association rules;
D O I
10.1023/A:1011429418057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
引用
收藏
页码:213 / 246
页数:34
相关论文
共 51 条
  • [1] Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
  • [2] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [3] AGRAWAL R, 1995, P 21 INT C VER LARG
  • [4] Agresti A., 1990, Analysis of categorical data
  • [5] [Anonymous], 1987, MULTIPLE COMP PROCED, DOI DOI 10.1002/9780470316672
  • [6] [Anonymous], P 1997 INT C KNOWL D
  • [7] BAY SD, 1999, UCI KDD ARCH
  • [8] BAY SD, 1999, P 5 ACM SIGKDD INT C, P302
  • [9] BAYARDO R, 1998, P ACM SIGMOD C MAN D
  • [10] BAYARDO RJ, 1999, P 15 INT C DAT ENG