Analysis of named entity recognition and linking for tweets

被引:224
作者
Derczynski, Leon [1 ]
Maynard, Diana [1 ]
Rizzo, Giuseppe [2 ,4 ]
van Erp, Marieke [3 ]
Gorrell, Genevieve [1 ]
Troncy, Raphael [2 ]
Petrak, Johann [1 ]
Bontcheva, Kalina [1 ]
机构
[1] Univ Sheffield, Sheffield S1 4DP, S Yorkshire, England
[2] EURECOM, F-06904 Sophia Antipolis, France
[3] Vrije Univ Amsterdam, NL-1081 HV Amsterdam, Netherlands
[4] Univ Turin, I-10124 Turin, Italy
基金
英国工程与自然科学研究理事会;
关键词
Information extraction; Named entity recognition; Entity disambiguation; Microblogs; Twitter;
D O I
10.1016/j.ipm.2014.10.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art. (C) 2015 Published by Elsevier Ltd.
引用
收藏
页码:32 / 49
页数:18
相关论文
共 59 条
[1]
Abel F., 2011, P 8 EXT SEM WEB C ES
[2]
[Anonymous], 2011, P 49 ANN M ASS COMPU
[3]
[Anonymous], 2005, P 43 ANN M ASS COMPU
[4]
[Anonymous], 2012, P ACL 2012 SYST DEM
[5]
[Anonymous], 2014, SHORT PAPERS
[6]
[Anonymous], 2011, I SEMANTICS
[7]
[Anonymous], 2013, P INT C REC ADV NAT
[8]
[Anonymous], P 5 INT C WEB SEARCH
[9]
[Anonymous], P NLP CAN U TAG US W
[10]
[Anonymous], P C N AM CHAPT ASS C