Improving the Quality of Linked Data Using Statistical Distributions

被引:164
作者
Paulheim, Heiko [1 ]
Bizer, Christian [1 ]
机构
[1] Univ Mannheim, Data & Web Sci Grp, D-68131 Mannheim, Germany
关键词
Data Quality; DBpedia; Error Detection; NELL; Noisy Data; Semi-Structured Data; Type Completion;
D O I
10.4018/ijswis.2014040104
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.
引用
收藏
页码:63 / 86
页数:24
相关论文
共 40 条
[1]
Acosta M., 2013, P 12 INT SEM WEB C
[2]
[Anonymous], 2013, P 9 INT C SEMANTIC S, DOI DOI 10.1145/2506182.2506195
[3]
[Anonymous], INT WORLD WID WEB C
[4]
[Anonymous], 2014, Semantic Web Journal
[5]
[Anonymous], 2005, ACM SIGKDD EXPLOR NE
[6]
Aprosio A. P., 2013, P 10 EXT SEM WEB C E
[7]
Augenstein I., P EXT SEM WEB C, DOI [10.1007/978-3-642-30284-8_21, DOI 10.1007/978-3-642-30284-8_21]
[8]
Bizer C., 2006, 5 INT SEM WEB C, V26
[9]
Linked Data - The Story So Far [J].
Bizer, Christian ;
Heath, Tom ;
Berners-Lee, Tim .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22
[10]
DBpedia - A crystallization point for the Web of Data [J].
Bizer, Christian ;
Lehmann, Jens ;
Kobilarov, Georgi ;
Auer, Soeren ;
Becker, Christian ;
Cyganiak, Richard ;
Hellmann, Sebastian .
JOURNAL OF WEB SEMANTICS, 2009, 7 (03) :154-165