Using machine learning methods for disambiguating place references in textual documents

被引:39
作者
Santos, Joao [1 ]
Anastacio, Ivo [1 ]
Martins, Bruno [1 ]
机构
[1] Inst Super Tecn, Lisbon, Portugal
关键词
Place reference disambiguation; Geographic text mining and retrieval; Entity linking in text; Learning to rank;
D O I
10.1007/s10708-014-9553-y
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
This paper presents a machine learning method for disambiguating place references in text. Solving this task can have important applications in the digital humanities and computational social sciences, by supporting the geospatial analysis of large document collections. We combine multiple features that capture the similarity between candidate disambiguations, the place references, and the context where the place references occur, in order to rank and choose from a set of candidate disambiguations, obtained from a knowledge base containing geospatial coordinates and textual descriptions for different places from all around the world. The proposed method was evaluated through English corpora used in previous work in this area, and also with a subset of the English Wikipedia. Experimental results demonstrate that the proposed method is indeed effective, showing that out-of-the-box learning algorithms and relatively simple features can obtain a high accuracy in this task.
引用
收藏
页码:375 / 392
页数:18
相关论文
共 34 条
[1]  
Adams B., 2013, CROWDSOURCING GEOGRA
[2]  
Adams B., 2012, P INT AAAI C WEBL SO
[3]  
Amitay E., 2004, P ACM SIGIR C INF RE
[4]  
Anastacio I., 2011, P TEXT AN C
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Broder A. Z., 1997, P C COMP COMPL SEQ
[7]  
Brown T., 2012, TEXAS STUDIES LIT LA
[8]  
Bunescu R., 2006, P EUR C ASS COMP LIN
[9]  
Christopher JC, 2010, LEARNING, V11, P23
[10]  
Cucerzan S, 2007, P JOINT C EMP METH N