Lineage retrieval for scientific data processing: A survey

被引:167
作者
Bose, R [1 ]
Frew, J [1 ]
机构
[1] Univ Calif Santa Barbara, Bren Sch Environm Sci & Management, Santa Barbara, CA 93106 USA
关键词
design; documentation; experimentation; management; data lineage; data provenance; scientific data; scientific workflow; audit;
D O I
10.1145/1057977.1057978
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.
引用
收藏
页码:1 / 28
页数:28
相关论文
共 115 条
[1]  
Alonso G, 1998, NATO ADV SCI I F-COM, V164, P195
[2]  
ALONSO G, 1997, P 5 INT S SPAT DAT S, P238
[3]  
ALONSO G, 1994, THESIS U CALIFORNIA
[4]  
ALONSO G, 1997, FUNCTIONALITY LIMITA
[5]  
ALONSO G, 1993, P ACM WORKSH ADV GEO, P38
[6]  
[Anonymous], P 6 INT C DAT EXP SY
[7]  
[Anonymous], WORKSH DAT DER PROV
[8]  
[Anonymous], 2000, UNIFIED MODELING LAN, DOI DOI 10.1007/3-540-40011-7_10
[9]  
[Anonymous], 1998, KLUWER INT SER ENG C
[10]  
Aoyama M, 2002, ICSE 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P647, DOI 10.1109/ICSE.2002.1008011