A Declarative Framework for Linking Entities

被引:25
作者
Burdick, Douglas [1 ]
Fagin, Ronald [1 ]
Kolaitis, Phokion G. [1 ,2 ]
Popa, Lucian [1 ]
Tan, Wang-Chiew [2 ]
机构
[1] IBM Res Almaden, 650 Harry Rd, San Jose, CA 95120 USA
[2] UC Santa Cruz, 1156 High St, Santa Cruz, CA 95060 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2016年 / 41卷 / 03期
基金
美国国家科学基金会;
关键词
Entity linking; constraints; certain links; maximum-value solutions; Markov logic networks; maximum-probability worlds; RESOLUTION;
D O I
10.1145/2894748
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
We introduce and develop a declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on a systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons two entities should be linked. We also consider extensions of the core language that capture collective entity resolution by allowing interdependencies among the link relations. We identify a class of "good" solutions for entity-linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the "good" solutions and the problem of finding the certain links, which are the links that appear in every "good" solution. We show that these problems are tractable for the core language but may become intractable once we allow interdependencies among the link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.
引用
收藏
页数:38
相关论文
共 33 条
[1]
Alexe Bogdan, 2013, In Search of Elegance in the Theory and Practice of Computation. Essays Dedicated to Peter Buneman: LNCS 8000, P36, DOI 10.1007/978-3-642-41660-6_3
[2]
Alur N., 2008, IBM WEBSPHERE QUALIT
[3]
[Anonymous], 2013, EDBT
[4]
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data (TKDD), DOI [DOI 10.1145/1217299.1217304, 10.1145/1217299.1217304]
[5]
[Anonymous], 2012, SYNTH LECT DATA MANA
[6]
Large-Scale Deduplication with Constraints using Dedupalog [J].
Arasu, Arvind ;
Re, Christopher ;
Suciu, Dan .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :952-963
[7]
Arenas M., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P68, DOI 10.1145/303976.303983
[8]
Solutions and query rewriting in data exchange [J].
Arenas, Marcelo ;
Barcelo, Pablo ;
Fagin, Ronald ;
Libkin, Leonid .
INFORMATION AND COMPUTATION, 2013, 228 :28-61
[9]
Data Cleaning and Query Answering with Matching Dependencies and Matching Functions [J].
Bertossi, Leopoldo ;
Kolahi, Solmaz ;
Lakshmanan, Laks V. S. .
THEORY OF COMPUTING SYSTEMS, 2013, 52 (03) :441-482
[10]
Burdick D., 2015, 18 INT C DAT THEOR I, P25