An annotation management system for relational databases

被引:56
作者
Bhagwat, D [1 ]
Chiticariu, L [1 ]
Tan, WC [1 ]
Vijayvargiya, G [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
关键词
data provenance; lineage; annotation propagation; metadata;
D O I
10.1007/s00778-005-0156-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.
引用
收藏
页码:373 / 396
页数:24
相关论文
共 27 条
[1]  
Abiteboul Serge, 1995, Foundations of Databases, DOI DOI 10.5555/551350
[2]  
[Anonymous], 2005, SIGMOD, DOI DOI 10.1145/1066157.1066296
[3]  
[Anonymous], P ACM S OP SYST PRIN
[4]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]  
BERNSTEIN PA, 1999, IEEE DATA ENG B, V22, P9
[7]  
Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316
[8]  
Buneman P., 2002, P ACM S PRINCIPLES D, P150, DOI DOI 10.1145/543613.543633
[9]  
Chaudhuri S., 1993, Optimization of real conjunctive queries, DOI 10.1145/153850.153856
[10]   Tracing the lineage of view data in a warehousing environment [J].
Cui, YW ;
Widom, J ;
Wiener, JL .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2000, 25 (02) :179-227