Mining Skewed and Sparse Transaction Data for Personalized Shopping Recommendation

被引:9
作者
Chun-Nan Hsu
Hao-Hsiang Chung
Han-Shen Huang
机构
[1] Academia,Institute of Information Science
[2] National Taiwan University,Department of Computer Science and Information Engineering
来源
Machine Learning | 2004年 / 57卷
关键词
graphical models; user profiles; collaborative filtering; shopping recommendation; transaction data;
D O I
暂无
中图分类号
学科分类号
摘要
A good shopping recommender system can boost sales in a retailer store. To provide accurate recommendation, the recommender needs to accurately predict a customer's preference, an ability difficult to acquire. Conventional data mining techniques, such as association rule mining and collaborative filtering, can generally be applied to this problem, but rarely produce satisfying results due to the skewness and sparsity of transaction data. In this paper, we report the lessons that we learned in two real-world data mining applications for personalized shopping recommendation. We learned that extending a collaborative filtering method based on ratings (e.g., GroupLens) to perform personalized shopping recommendation is not trivial and that it is not appropriate to apply association-rule based methods (e.g., the IBM SmartPad system) for large scale prediction of customers' shopping preferences. Instead, a probabilistic graphical model can be more effective in handling skewed and sparse data. By casting collaborative filtering algorithms in a probabilistic framework, we derived HyPAM (Hybrid Poisson Aspect Modelling), a novel probabilistic graphical model for personalized shopping recommendation. Experimental results show that HyPAM outperforms GroupLens and the IBM method by generating much more accurate predictions of what items a customer will actually purchase in the unseen test data. The data sets and the results are made available for download at http://chunnan.iis.sinica.edu.tw/hypam/HyPAM.html.
引用
收藏
页码:35 / 59
页数:24
相关论文
共 16 条
[1]
Apte C.(2002)Business applications of data mining Communications of the ACM 45 49-53
[2]
Liu B.(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society 39 1-37
[3]
Pednault E. P. D.(1992)Using collaborative filtering to weave an information tapestry Communications of the ACM 35 61-70
[4]
Smyth P.(2001)Personalization of supermarket product recommendations Data Mining and Knowledge Discovery 5 11-32
[5]
Dempster A.(undefined)undefined undefined undefined undefined-undefined
[6]
Laird N.(undefined)undefined undefined undefined undefined-undefined
[7]
Rubin D.(undefined)undefined undefined undefined undefined-undefined
[8]
Goldberg D.(undefined)undefined undefined undefined undefined-undefined
[9]
Nichols D.(undefined)undefined undefined undefined undefined-undefined
[10]
Oki B. M.(undefined)undefined undefined undefined undefined-undefined