A weighted multi-attribute method for matching user-generated Points of Interest

被引:75
作者
McKenzie, Grant [1 ]
Janowicz, Krzysztof [1 ]
Adams, Benjamin [2 ]
机构
[1] Univ Calif Santa Barbara, Dept Geog, Santa Barbara, CA 93106 USA
[2] Univ Auckland, Dept Comp Sci, Ctr eRes, Auckland 1, New Zealand
关键词
Points of Interest; matching; Linked Data; volunteered geographic information;
D O I
10.1080/15230406.2014.880327
中图分类号
P9 [自然地理学]; K9 [地理];
学科分类号
0705 ; 070501 ;
摘要
To a large degree, the attraction of Big Data lies in the variety of its heterogeneous multi-thematic and multi-dimensional data sources and not merely its volume. To fully exploit this variety, however, requires conflation. This is a two-step process. First, one has to establish identity relations between information entities across different data sources; and second, attribute values have to be merged according to certain procedures that avoid logical contradictions. The first step, also called matching, can be thought of as a weighted combination of common attributes according to some similarity measures. In this work, we propose such a matching based on multiple attributes of Points of Interest (POI) from the Location-based Social Network Foursquare and the local directory service Yelp. While both contain overlapping attributes that can be used for matching, they have specific strengths and weaknesses that make their conflation desirable. For instance, Foursquare offers information about user check-ins to places, while Yelp specializes in user-contributed reviews. We present a weighted multi-attribute matching strategy, evaluate its performance, and discuss application areas that benefit from a successful matching. Finally, we also outline how the established POI matches can be stored as Linked Data on the Semantic Web. Our strategy can automatically match 97% of randomly selected Yelp POI to their corresponding Foursquare entities.
引用
收藏
页码:125 / 137
页数:13
相关论文
共 42 条
[31]  
McKenzie G., 2013, 21 ACM SIGSPATIAL IN
[32]  
Mülliganni C, 2011, LECT NOTES COMPUT SC, V6899, P350, DOI 10.1007/978-3-642-23196-4_19
[33]  
Page L, 2001, US patent, Patent No. [6,285,999, 6285999]
[34]  
Peng T, 2012, GSTF J COMPUTING JOC, V2, P55
[35]  
Philips L., 2000, C/C++ Users Journal, V18, P38
[36]  
Philips L., 1990, Computer Language, V7, P39
[37]  
Ramage Daniel., 2009, EMNLP
[38]   The Pareto, Zipf and other power laws [J].
Reed, WJ .
ECONOMICS LETTERS, 2001, 74 (01) :15-19
[39]  
Scheffler Tatjana, 2012, KI 2012: Advances in Artificial Intelligence. Proceedings of the 35th Annual German Conference on AI, P245, DOI 10.1007/978-3-642-33347-7_24
[40]  
Wu Y., 2009, 10 INT C GEOC