Taming Uncertainty in Big Data Evidence from Social Media in Urban Areas

被引:11
作者
Bendler, Johannes [1 ]
Wagner, Sebastian [1 ]
Brandt, Tobias [1 ]
Neumann, Dirk [1 ]
机构
[1] Univ Freiburg, D-79098 Freiburg, Germany
来源
BUSINESS & INFORMATION SYSTEMS ENGINEERING | 2014年 / 6卷 / 05期
关键词
Big data; Uncertainty; Social media; Veracity; Spatio-temporal patterns; Points of interest;
D O I
10.1007/s12599-014-0342-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While the classic definition of Big Data included the dimensions volume, velocity, and variety, a fourth dimension, veracity, has recently come to the attention of researchers and practitioners. The increasing amount of user-generated data associated with the rise of social media emphasizes the need for methods to deal with the uncertainty inherent to these data sources. In this paper we address one aspect of uncertainty by developing a new methodology to establish the reliability of user-generated data based upon causal links with recurring patterns. We associate a large data set of geo-tagged Twitter messages in San Francisco with points of interest, such as bars, restaurants, or museums, within the city. This model is validated by causal relationships between a point of interest and the amount of messages in its vicinity. We subsequently analyze the behavior of these messages over time using a jackknifing procedure to identify categories of points of interest that exhibit consistent patterns over time. Ultimately, we condense this analysis into an indicator that gives evidence on the certainty of a data set based on these causal relationships and recurring patterns in temporal and spatial dimensions.
引用
收藏
页码:279 / 288
页数:10
相关论文
共 17 条
  • [1] [Anonymous], P ACM SIGKDD INT C K
  • [2] [Anonymous], 2011, PROC 3 ACM SIGSPATIA, DOI DOI 10.1145/2063212.2063226
  • [3] Cheng A., 2009, An In-Depth Look Inside the Twitter World
  • [4] DU Y, 2011, IEEE WIR COMM NETW C, P1086
  • [5] Heinrich B, 28 INT C INF SYST
  • [6] The World's Technological Capacity to Store, Communicate, and Compute Information
    Hilbert, Martin
    Lopez, Priscila
    [J]. SCIENCE, 2011, 332 (6025) : 60 - 65
  • [7] IBM, 2013, 4 VS BIG DAT INFOGRA
  • [8] Varieties of social influence: The role of utility and norms in the success of a new communication medium
    Kraut, RE
    Rice, RE
    Cool, C
    Fish, RS
    [J]. ORGANIZATION SCIENCE, 1998, 9 (04) : 437 - 453
  • [9] Discovery of unusual regional social activities using geo-tagged microblogs
    Lee, Ryong
    Wakamiya, Shoko
    Sumiya, Kazutoshi
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (04): : 321 - 349
  • [10] Otto B., 2007, ACIS 2007 P