Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention

被引:677
作者
Chen, Jingyuan [1 ]
Zhang, Hanwang [2 ]
He, Xiangnan [1 ]
Nie, Liqiang [3 ]
Liu, Wei [4 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Columbia Univ, New York, NY 10027 USA
[3] ShanDong Univ, Jinan, Shandong, Peoples R China
[4] Tencent AI Lab, Bellevue, WA USA
来源
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2017年
基金
新加坡国家研究基金会;
关键词
Collaborative Filtering; Implicit Feedback; Attention; Multimedia Recommendation;
D O I
10.1145/3077136.3080797
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g., photos, videos, songs, etc.) are unknown, while the componentlevel implicitness means that inside each item users' preferences on different components (e.g., regions in an image, frames of a video, etc.) are unknown. For example, a "view" on a video does not provide any specific information about how the user likes the video (i.e., item-level) and which parts of the video the user is interested in (i.e., component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g., CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.
引用
收藏
页码:335 / 344
页数:10
相关论文
共 46 条
[1]
[Anonymous], 2017, SIGIR
[2]
[Anonymous], SIGIR
[3]
[Anonymous], 2016, CVPR
[4]
[Anonymous], 2017, CVPR
[5]
[Anonymous], 2014, ICLR
[6]
[Anonymous], 2012, P 20 ACM INT C MULT
[7]
[Anonymous], 2016, P 24 ACM INT C MULT
[8]
[Anonymous], 2008, WWW
[9]
Up Next: Retrieval Methods for Large Scale Related Video Suggestion [J].
Bendersky, Michael ;
Garcia-Pueyo, Lluis ;
Harmsen, Jeremiah ;
Josifovski, Vanja ;
Lepikhin, Dima .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1769-1778
[10]
Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection [J].
Chen, Jingyuan .
MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :1454-1458