Nowcasting Events from the Social Web with Statistical Learning

被引:118
作者
Lampos, Vasileios [1 ]
Cristianini, Nello [1 ]
机构
[1] Univ Bristol, Bristol BS8 1TH, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
Algorithms; Design; Experimentation; Measurement; Performance; Event detection; feature selection; LASSO; social network mining; sparse learning; Twitter; MODEL SELECTION;
D O I
10.1145/2337542.2337557
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web. Having geotagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length -approximately one year- of Twitter's data time series.
引用
收藏
页数:22
相关论文
共 24 条
[1]  
[Anonymous], P KDD WORKSH SOC MED
[2]   Serglycin-deficient cytotoxic T lymphocytes display defective secretory granule maturation and granzyme B storage [J].
Grujic, M ;
Braga, T ;
Lukinius, A ;
Eloranta, ML ;
Knight, SD ;
Pejler, G ;
Åbrink, M .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (39) :33411-33418
[3]  
Asur S., 2010, Proceedings 2010 IEEE/ACM International Conference on Web Intelligence-Intelligent Agent Technology (WI-IAT), P492, DOI 10.1109/WI-IAT.2010.63
[4]  
Bach F. R., 2008, P 25 INT C MACH LEAR, P33, DOI DOI 10.1145/1390156.1390161
[5]  
BARTLETT P. L., 2009, IL REGULARIZED LINEA
[6]  
Bo P., 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI DOI 10.1561/1500000011
[7]  
BOLLEN J., 2011, J COMPUT SCI
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
Corley Courtney D., 2009, Proceedings of the 2009 International Conference on Bioinformatics & Computational Biology. BIOCOMP 2009, P340
[10]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499