A study of mixture models for collaborative filtering

被引:61
作者
Jin, Rong [1 ]
Si, Luo
Zhai, Chengxiang
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[3] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
来源
INFORMATION RETRIEVAL | 2006年 / 9卷 / 03期
关键词
collaborative filtering; graphical model; probabilistic model;
D O I
10.1007/s10791-006-4651-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collaborative filtering is a general technique for exploiting the preference patterns of a group of users to predict the utility of items for a particular user. Three different components need to be modeled in a collaborative filtering problem: users, items, and ratings. Previous research on applying probabilistic models to collaborative filtering has shown promising results. However, there is a lack of systematic studies of different ways to model each of the three components and their interactions. In this paper, we conduct a broad and systematic study on different mixture models for collaborative filtering. We discuss general issues related to using a mixture model for collaborative filtering, and propose three properties that a graphical model is expected to satisfy. Using these properties, we thoroughly examine five different mixture models, including Bayesian Clustering (BC), Aspect Model (AM), Flexible Mixture Model (FMM), Joint Mixture Model (JMM), and the Decoupled Model (DM). We compare these models both analytically and experimentally. Experiments over two datasets of movie ratings under different configurations show that in general, whether a model satisfies the proposed properties tends to be correlated with its performance. In particular, the Decoupled Model, which satisfies all the three desired properties, outperforms the other mixture models as well as many other existing approaches for collaborative filtering. Our study shows that graphical models are powerful tools for modeling collaborative filtering, but careful design is necessary to achieve good performance.
引用
收藏
页码:357 / 382
页数:26
相关论文
共 19 条
[1]  
[Anonymous], 1999, P 22 ANN INT ACM SIG
[2]  
[Anonymous], 1998, P 14 C UNC ART INT
[3]  
[Anonymous], P 18 NAT C ART INT
[4]  
COHEN W, 1998, ADV NEURAL PROCESSIN
[5]  
Connor M., 2001, P SIGIR 2001 WORKSH
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
Fisher D., 2000, P 23 ANN INT C RES D
[8]  
FREUND Y, 1998, P ICML 1998
[9]  
HA V, 1998, P UAI 1998
[10]  
Hofmann T, 2003, P 26 ANN INT ACM SIG