A Model of Process Documentation to Determine Provenance in Mash-Ups

被引：23

作者：

Groth, Paul ^{[1
]}

Miles, Simon ^{[2
]}

Moreau, Luc ^{[3
]}

机构：

[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA

[2] Kings Coll London, Dept Comp Sci, London WC2T 2LS, England

[3] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, Hants, England

来源：

ACM TRANSACTIONS ON INTERNET TECHNOLOGY | 2009年 / 9卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

Design; Documentation; Standardization; Process; process documentation; provenance; data model; concept maps; mash-ups; LINEAGE;

D O I：

10.1145/1462159.1462162

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Through technologies such as RSS (Really Simple Syndication), Web Services, and AJAX (Asynchronous JavaScript and XML), the Internet has facilitated the emergence of applications that are composed from a variety of services and data sources. Through tools such as Yahoo Pipes, these "mash-ups" can be composed in a dynamic, just-in-time manner from components provided by multiple institutions (i.e., Google, Amazon, your neighbor). However, when using these applications, it is not apparent where data comes from or how it is processed. Thus, to inspire trust and confidence in mash-ups, it is critical to be able to analyze their processes after the fact. These trailing analyses, in particular the determination of the provenance of a result (i.e., the process that led to it), are enabled by process documentation, which is documentation of an application's past process created by the components of that application at execution time. In this article, we define a generic conceptual data model that supports the autonomous creation of attributable, factual process documentation for dynamic multi-institutional applications. The data model is instantiated using two Internet formats, OWL and XML, and is evaluated with respect to questions about the provenance of results generated by a complex bioinformatics mash-up.

引用

页数：31

共 49 条

[1] Aguilera M. K., 2003, SOSP 03
[2] Altintas I, 2006, LECT NOTES COMPUT SC, V4145, P118
[3] [Anonymous], 2006, LECT NOTES COMPUTER
[4] [Anonymous], 1999, GRID BLUEPRINT NEW C
[5] DEBUGGING HETEROGENEOUS DISTRIBUTED SYSTEMS USING EVENT-BASED MODELS OF BEHAVIOR
BATES, PC
[J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1995, 13 (01): : 1 - 31
[6] BOAG S, 2006, XQUERY 1 0 XML QUERY
[7] UML in action
Booch, G
[J]. COMMUNICATIONS OF THE ACM, 1999, 42 (10) : 26 - 28
[8] Lineage retrieval for scientific data processing: A survey
Bose, R
Frew, J
[J]. ACM COMPUTING SURVEYS, 2005, 37 (01) : 1 - 28
[9] Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316
[10] Mashups mix data into global service
Butler, D
[J]. NATURE, 2006, 439 (7072) : 6 - 7

← 1 2 3 4 5 →