Integrating open government data with stratosphere for more transparency

被引:22
作者
Heise, Arvid [1 ]
Naumann, Felix [1 ]
机构
[1] Hasso Plattner Inst, Potsdam, Germany
来源
JOURNAL OF WEB SEMANTICS | 2012年 / 14卷
关键词
Data integration; Data cleansing; Record linkage; Data fusion; Parallel query processing; Map-reduce;
D O I
10.1016/j.websem.2012.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Governments are increasingly publishing their data to enable organizations and citizens to browse and analyze the data. However, the heterogeneity of this Open Government Data hinders meaningful search, analysis, and integration and thus limits the desired transparency. In this article, we present the newly developed data integration operators of the Stratosphere parallel data analysis framework to overcome the heterogeneity. With declaratively specified queries, we demonstrate the integration of well-known government data sources and other large open data sets at technical, structural, and semantic levels. Furthermore, we publish the integrated data on the Web in a form that enables users to discover relationships between persons, government agencies, funds, and companies. The evaluation shows that linking person entities of different data sets results in a good precision of 98.3% and a recall of 95.2%. Moreover, the integration of large data sets scales well on up to eight machines. (C) 2012 Elsevier B. V. All rights reserved.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 24 条
[1]   Massively Parallel Data Analysis with PACTs on Nephele [J].
Alexandrov, Alexander ;
Heimel, Max ;
Markl, Volker ;
Battre, Dominic ;
Hueske, Fabian ;
Nijkamp, Erik ;
Ewen, Stephan ;
Kao, Odej ;
Warneke, Daniel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02) :1625-1628
[2]  
Alexandrov Alexander., 2011, BTW, P25
[3]  
[Anonymous], 2010, BE DATA JOURNALIST
[4]  
[Anonymous], Q B IEEE TC DATA ENG
[5]  
[Anonymous], 2010, SoCC, DOI DOI 10.1145/1807128.1807148
[6]  
Bohm C., 2010, P INT C SEM SYST I S
[7]  
Borkar V, 2011, PROC INT CONF DATA, P1151, DOI 10.1109/ICDE.2011.5767921
[8]   Graph Twiddling in a MapReduce World [J].
Cohen, Jonathan .
COMPUTING IN SCIENCE & ENGINEERING, 2009, 11 (04) :29-41
[9]  
Ding L., 2011, WEB SEMANTICS SCI SE, V9, P253
[10]  
Dittrich J, 2010, PROC VLDB ENDOW, V3, P518