Detecting Anomalies in Dynamic Rating Data: A Robust Probabilistic Model for Rating Evolution

被引:28
作者
Gunnemann, Stephan [1 ]
Gunnemann, Nikou [1 ]
Faloutsos, Christos [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14) | 2014年
关键词
robust mining; anomaly detection; categorical mixtures;
D O I
10.1145/2623330.2623721
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rating data is ubiquitous on websites such as Amazon, Trip-Advisor, or Yelp. Since ratings are not static but given at various points in time, a temporal analysis of rating data provides deeper insights into the evolution of a product's quality. In this work, we tackle the following question: Given the time stamped rating data for a product or service, how can we detect the general rating behavior of users as well as time intervals where the ratings behave anomalous? We propose a Bayesian model that represents the rating data as sequence of categorical mixture models. In contrast to existing methods, our method does not require any aggregation of the input but it operates on the original time stamped data. To capture the dynamic effects of the ratings, the categorical mixtures are temporally constrained: Anomalies can occur in specific time intervals only and the general rating behavior should evolve smoothly over time. Our method automatically determines the intervals where anomalies occur, and it captures the temporal effects of the general behavior by using a state space model on the natural parameters of the categorical distributions. For learning our model, we propose an efficient algorithm combining principles from variational inference and dynamic programming. In our experimental study we show the effectiveness of our method and we present interesting discoveries on multiple real world datasets.
引用
收藏
页码:841 / 850
页数:10
相关论文
共 18 条
[1]  
Ahmed Amr, 2010, UAI'10, P20
[2]  
[Anonymous], 2007, WEBKDD SNA KDD
[3]  
[Anonymous], 2013, Outlier Analysis, DOI [DOI 10.1007/978-1-4614-6396-2, 10.1007/978-1-4614-6396-2]
[4]  
[Anonymous], 2005, NEW INTRO MULTIPLE T
[5]  
Bengtsson F, 2006, LECT NOTES COMPUT SC, V4112, P255
[6]  
Bentley J., 1984, Communications of the ACM, V27, P865, DOI 10.1145/358234.381162
[7]  
Bishop C.M., 2006, J ELECTRON IMAGING, V16, P049901, DOI DOI 10.1117/1.2819119
[8]  
Blei D.M., 2006, P 23 INT C MACHINE L, P113, DOI [DOI 10.1145/1143844.1143859, 10.1145/1143844.114385]
[9]   Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings [J].
Gunnemann, Nikou ;
Gunnemann, Stephan ;
Faloutsos, Christos .
WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, :361-371
[10]  
Jindal N., 2008, Proceedings of the 2008 International Conference on Web Search and Data Mining: ACM, DOI DOI 10.1145/1341531.1341560