What are we 'tweeting' about obesity? Mapping tweets with topic modeling and Geographic Information System

被引:144
作者
Ghosh, Debarchana [1 ]
Guha, Rajarshi [2 ]
机构
[1] Univ Connecticut, Dept Geog, Storrs, CT 06040 USA
[2] NIH, Ctr Adv Translat Sci, Rockville, MD 20850 USA
关键词
mapping; social media; topic models; GIS; text mining; obesity;
D O I
10.1080/15230406.2013.776210
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
Public health related tweets are difficult to identify in large conversational datasets like Twitter.com. Even more challenging is the visualization and analyses of the spatial patterns encoded in tweets. This study has the following objectives: how can topic modeling be used to identify relevant public health topics such as obesity on Twitter.com? What are the common obesity related themes? What is the spatial pattern of the themes? What are the research challenges of using large conversational datasets from social networking sites? Obesity is chosen as a test theme to demonstrate the effectiveness of topic modeling using Latent Dirichlet Allocation (LDA) and spatial analysis using Geographic Information System (GIS). The dataset is constructed from tweets (originating from the United States) extracted from Twitter.com on obesity-related queries. Examples of such queries are 'food deserts', 'fast food', and 'childhood obesity'. The tweets are also georeferenced and time stamped. Three cohesive and meaningful themes such as 'childhood obesity and schools', 'obesity prevention', and 'obesity and food habits' are extracted from the LDA model. The GIS analysis of the extracted themes show distinct spatial pattern between rural and urban areas, northern and southern states, and between coasts and inland states. Further, relating the themes with ancillary datasets such as US census and locations of fast food restaurants based upon the location of the tweets in a GIS environment opened new avenues for spatial analyses and mapping. Therefore the techniques used in this study provide a possible toolset for computational social scientists in general, and health researchers in specific, to better understand health problems from large conversational datasets.
引用
收藏
页码:90 / 102
页数:13
相关论文
共 20 条
[1]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[2]  
Blei David M., 2009, Text Mining: Classification, Clustering, and Applications, P71
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]   Social Network Sites: Definition, History, and Scholarship [J].
Boyd, Danah M. ;
Ellison, Nicole B. .
JOURNAL OF COMPUTER-MEDIATED COMMUNICATION, 2007, 13 (01) :210-230
[5]  
Butts C. T., 2011, SAGE HDB GIS SOC RES, P222, DOI DOI 10.4135/9781446201046.N12
[6]   Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak [J].
Chew, Cynthia ;
Eysenbach, Gunther .
PLOS ONE, 2010, 5 (11)
[7]  
Culotta A, 2010, KDD WORKSH SOC MED A
[8]  
Feinerer I., 2012, R PACKAGE VERSION 0, P1
[9]  
Feinerer I, 2008, J STAT SOFTW, V25, P1
[10]   Geospatial Study of Psychiatric Mental Health-Advanced Practice Registered Nurses (PMH-APRNs) in the United States [J].
Ghosh, Debarchana ;
Sterns, Anthony A. ;
Drew, Barbara L. ;
Hamera, Edna .
PSYCHIATRIC SERVICES, 2011, 62 (12) :1506-1509