Implications of probabilistic data modeling for mining association rules

被引:27
作者
Hahsler, M [1 ]
Hornik, K
Reutterer, T
机构
[1] Vienna Univ Econ & Business Adm, Dept Informat Syst & Operat, A-1090 Vienna, Austria
[2] Vienna Univ Econ & Business Adm, Dept Stat & Math, A-1090 Vienna, Austria
[3] Vienna Univ Econ & Business Adm, Dept Retailing & Mkt, A-1090 Vienna, Austria
来源
FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING | 2006年
关键词
D O I
10.1007/3-540-31314-1_73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world grocery database to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left-hand-side of rules and that lift performs poorly to filter random noise in transaction data. The probabilistic data modeling approach presented in this paper not only is a valuable framework to analyze interest measures but also provides a starting point for further research to develop new interest measures which are based on statistical tests and geared towards the specific properties of transaction data.
引用
收藏
页码:598 / +
页数:2
相关论文
共 13 条
[1]
Aggarwal C. C., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P18, DOI 10.1145/275487.275490
[2]
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]
[4]
Building an association rules framework to improve product assortment decisions [J].
Brijs, T ;
Swinnen, G ;
Vanhoof, K ;
Wets, G .
DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) :7-23
[5]
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[6]
DuMouchel W., PROCEEDING C KNOWLED, P67
[7]
Goethals B., 2004, ACM SIGKDD Explor. Newsl, V6, P109, DOI [10.1145/1007730.1007744, DOI 10.1145/1007730.1007744]
[8]
HAHSLER M, 2005, 14 WIRSH DEP STAT MA
[9]
Hipp Jochen, 2000, ACM SIGKDDExplorations Newslett., V2, P58, DOI [DOI 10.1145/360402.360421, 10.1145/360402.360421]
[10]
Hruschka H, 1999, J RETAIL CONSUM SERV, V6, P99, DOI DOI 10.1016/S0969-6989(98)00026-5