The Twitter of Babel: Mapping World Languages through Microblogging Platforms

被引:165
作者
Mocanu, Delia [1 ]
Baronchelli, Andrea [1 ]
Perra, Nicola [1 ]
Goncalves, Bruno [2 ]
Zhang, Qian [1 ]
Vespignani, Alessandro [1 ,3 ,4 ]
机构
[1] Northeastern Univ, Lab Modeling Biol & Sociotech Syst, Boston, MA 02115 USA
[2] Aix Marseille Univ, CNRS, CPT, UMR 7332, Marseille, France
[3] Harvard Univ, Inst Quantitat Social Sci, Cambridge, MA 02138 USA
[4] Inst Sci Interchange Fdn, Turin, Italy
来源
PLOS ONE | 2013年 / 8卷 / 04期
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pone.0061981
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies'' of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.
引用
收藏
页数:9
相关论文
共 26 条
  • [1] [Anonymous], 2011, INT C INF KNOWL MAN, DOI DOI 10.1145/2063576.2063724
  • [2] [Anonymous], 2011, P 5 INT AAAI C WEBL
  • [3] [Anonymous], 2011, Fifth International AAAI Conference on Weblogs and Social Media, DOI 10.1609/icwsm.v5i1.14127
  • [4] [Anonymous], P 6 INT AAAI C WEBL
  • [5] Serglycin-deficient cytotoxic T lymphocytes display defective secretory granule maturation and granzyme B storage
    Grujic, M
    Braga, T
    Lukinius, A
    Eloranta, ML
    Knight, SD
    Pejler, G
    Åbrink, M
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (39) : 33411 - 33418
  • [6] LANGUAGE DYNAMICS
    Baronchelli, Andrea
    Loreto, Vittorio
    Tria, Francesca
    [J]. ADVANCES IN COMPLEX SYSTEMS, 2012, 15 (3-4):
  • [7] Structural and Dynamical Patterns on Online Social Networks: The Spanish May 15th Movement as a Case Study
    Borge-Holthoefer, Javier
    Rivero, Alejandro
    Garcia, Inigo
    Cauhe, Elisa
    Ferrer, Alfredo
    Ferrer, Dario
    Francos, David
    Iniguez, David
    Pilar Perez, Maria
    Ruiz, Gonzalo
    Sanz, Francisco
    Serrano, Fermin
    Vinas, Cristina
    Tarancon, Alfonso
    Moreno, Yamir
    [J]. PLOS ONE, 2011, 6 (08):
  • [8] Chu Z, 2010, 26TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2010), P21
  • [9] Conover M. D., 2011, Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing (PASSAT/SocialCom 2011), P192, DOI 10.1109/PASSAT/SocialCom.2011.34
  • [10] Culotta A., 2010, P 1 WORKSH SOC MED A, P115, DOI [DOI 10.1145/1964858.1964874, 10.1145/1964858.1964874]