Predicting Crowd Behavior with Big Public Data

被引:37
作者
Kallus, Nathan [1 ]
机构
[1] MIT, 77 Massachusetts Ave E40-149, Cambridge, MA 02139 USA
来源
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年
关键词
Web and social media mining; Twitter analysis; Crowd behavior; Forecasting; Event extraction; Temporal analytics; Sentiment analysis; Online activism;
D O I
10.1145/2567948.2579233
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. We present efforts to predict the occurrence, specific timeframe, and location of such actions before they occur based on public data collected from over 300,000 open content web sources in 7 languages, from all over the world, ranging from mainstream news to government publications to blogs and social media. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. Statements made on Twitter about a future date from the time of posting prove particularly indicative. We consider in particular the case of the 2013 Egyptian coup d'etat. The study validates and quantifies the common intuition that data on social media (beyond mainstream news sources) are able to predict major events.
引用
收藏
页码:625 / 630
页数:6
相关论文
共 14 条
  • [1] [Anonymous], WI IAT
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Predicting the Present with Google Trends
    Choi, Hyunyoung
    Varian, Hal
    [J]. ECONOMIC RECORD, 2012, 88 : 2 - 9
  • [4] In Search of Attention
    Da, Zhi
    Engelberg, Joseph
    Gao, Pengjie
    [J]. JOURNAL OF FINANCE, 2011, 66 (05) : 1461 - 1499
  • [5] Predicting consumer behavior with Web search
    Goel, Sharad
    Hofman, Jake M.
    Lahaie, Sebastien
    Pennock, David M.
    Watts, Duncan J.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (41) : 17486 - 17490
  • [6] The Dynamics of Protest Recruitment through an Online Network
    Gonzalez-Bailon, Sandra
    Borge-Holthoefer, Javier
    Rivero, Alejandro
    Moreno, Yamir
    [J]. SCIENTIFIC REPORTS, 2011, 1
  • [7] How to build a WebFountain: An architecture for very large-scale text analytics
    Gruhl, D
    Chavet, L
    Gibson, D
    Meyer, J
    Pattanayak, P
    Tomkins, A
    Zien, J
    [J]. IBM SYSTEMS JOURNAL, 2004, 43 (01) : 64 - 77
  • [8] Gruhl D., 2005, SIGKDD
  • [9] Kallus N., 2014, PREDICTING CROWD BEH
  • [10] MaltParser: A language-independent system for data-driven dependency parsing
    Nivre, Joakim
    Hall, Johan
    Nilsson, Jens
    Chanev, Atanas
    Eryigit, Güls¸en
    Kübler, Sandra
    Marinov, Svetoslav
    Marsi, Erwin
    [J]. Natural Language Engineering, 2007, 13 (02) : 95 - 135