Lineage retrieval for scientific data processing: A survey

被引:167
作者
Bose, R [1 ]
Frew, J [1 ]
机构
[1] Univ Calif Santa Barbara, Bren Sch Environm Sci & Management, Santa Barbara, CA 93106 USA
关键词
design; documentation; experimentation; management; data lineage; data provenance; scientific data; scientific workflow; audit;
D O I
10.1145/1057977.1057978
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.
引用
收藏
页码:1 / 28
页数:28
相关论文
共 115 条
[11]  
*AT T, 2001, GRAPHV GRAPH VIS SOF
[12]   Scientific workflow management in a distributed production environment [J].
Baker, N ;
McClatchey, R ;
LeGoff, JM .
FIRST INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING WORKSHOP, PROCEEDINGS, 1997, :291-299
[13]  
BARKSTROM BR, 1998, ISO ARCH WORKSHOP SE
[14]  
BARKSTROM BR, 2002, WORKSH DAT DER PROV
[15]  
BARRY A, 1998, MET DYN OBJ MOD PATT
[16]   AUDITING OF DATA ANALYSES [J].
BECKER, RA ;
CHAMBERS, JM .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1988, 9 (04) :747-760
[17]   Metacat: A schema-independent XML database system [J].
Berkley, C ;
Jones, M ;
Bojilova, J ;
Higgins, D .
THIRTEENTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2001, :171-179
[18]  
BERNSTEIN A, 1999, SIGMOD REC, V28, P7
[19]  
BROWN P, 1995, P 21 INT C VER LARG, P720
[20]  
Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316