Temporal Analytics on Big Data for Web Advertising

被引:38
作者
Chandramouli, Badrish [1 ]
Goldstein, Jonathan [2 ]
Duan, Songyun [1 ,3 ]
机构
[1] Microsoft Res, Redmond, WA USA
[2] Microsoft Corp, Redmond, WA 98052 USA
[3] IBM Corp, T J Watson Res, Hawthorne, NY 10504 USA
来源
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE) | 2012年
关键词
D O I
10.1109/ICDE.2012.55
中图分类号
TP301 [理论、方法];
学科分类号
080201 [机械制造及其自动化];
摘要
"Big Data" in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavioral Targeting (BT) to select ads for users based on prior searches, page views, etc. Previous work on BT has focused on techniques that scale well for offline data using M-R. However, this approach has limitations for BT-style applications that deal with temporal data: (1) many queries are temporal and not easily expressible in M-R, and moreover, the set-oriented nature of M-R front-ends such as SCOPE is not suitable for temporal processing; (2) as commercial systems mature, they may need to also directly analyze and react to real-time data feeds since a high turnaround time can result in missed opportunities, but it is difficult for current solutions to naturally also operate over real-time streams. Our contributions are twofold. First, we propose a novel framework called TiMR (pronounced timer), that combines a time-oriented data processing system with a M-R framework. Users perform analytics using temporal queries - these queries are succinct, scale-out-agnostic, and easy to write. They scale well on large-scale offline data using TiMR, and can work unmodified over real-time streams. We also propose new cost-based query fragmentation and temporal partitioning schemes for improving efficiency with TiMR. Second, we show the feasibility of this approach for BT, with new temporal algorithms that exploit new targeting opportunities. Experiments using real advertising data show that TiMR is efficient and incurs orders-of-magnitude lower development effort. Our BT solution is easy and succinct, and performs up to several times better than current schemes in terms of memory, learning time, and click-through-rate/coverage.
引用
收藏
页码:90 / 101
页数:12
相关论文
共 36 条
[1]
Ali M., 2009, VLDB
[2]
[Anonymous], 2003, SOSP
[3]
[Anonymous], 2004, OSDI
[4]
[Anonymous], 1995, DATA ENG B
[5]
[Anonymous], 2005, STAT MODELS THEORY P, DOI DOI 10.1017/CBO9781139165495
[6]
[Anonymous], 2005, CIDR
[7]
[Anonymous], 2010, NSDI
[8]
Babcock B, 2002, PODS
[9]
Balazinska M., 2005, SIGMOD
[10]
Barga RogerS., 2007, CIDR