The requirements of using provenance in e-science experiments

被引:72
作者
Miles S. [1 ]
Groth P. [1 ]
Branco M. [1 ]
Moreau L. [1 ]
机构
[1] School of Electronics and Computer Science, University of Southampton
基金
英国工程与自然科学研究理事会;
关键词
E-Science; Grid; Provenance; Requirements; Use case; Workflow;
D O I
10.1007/s10723-006-9055-3
中图分类号
学科分类号
摘要
In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items. However, there has been little principled analysis of what is actually required of a provenance architecture, so it is impossible to determine the functionality they would ideally support. In this paper, we present use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, technology and application-independent architecture. We propose an architecture that meets these requirements, analyse its features compared with other approaches and evaluate a preliminary implementation by attempting to realise two of the use cases. © Springer Science + Business Media B.V. 2006.
引用
收藏
页码:1 / 25
页数:24
相关论文
共 43 条
  • [1] Addis M., Ferris J., Greenwood M., Marvin D., Li P., Oinn T., Wipat A., Experiences with eScience workflow specification and enactment in bioinformatics, Proceedings of the UK OST E-Science Second All Hands Meeting 2003 (AHM-03), pp. 459-467, (2003)
  • [2] Alonso G., Abbadi A.E., GOOSE: Geographic object oriented support environment, Proceedings of the ACM Workshop on Advances in Geographic Information Systems, pp. 38-49, (1993)
  • [3] Alonso G., Hagen C., Geo-opera: Workflow concepts for spatial processes, Proceedings of 5th International Symposium on Spatial Databases (SSD -97), pp. 238-258, (1997)
  • [4] Andrews T., Curbera F., Dholakia H., Goland Y., Klein J., Leymann F., Liu K., Roller D., Smith D., Thatte S., Trickovic I., Weerawarana S., Business Process Execution Language for Web Services Version 1.1, (2006)
  • [5] Ashri R., Payne T., Marvin D., Surridge M., Taylor S., Towards a semantic web security infrastructure, Semantic Web Services, (2004)
  • [6] Becker R.A., Chambers J.M.J.M., Auditing of data analyses, SIAM J. Sci. Statist. Comput., 9, 4, pp. 747-760, (1988)
  • [7] Buneman P., Khanna S., Tajima K., Tan W., Archiving scientific data, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 1-12, (2002)
  • [8] Buneman P., Khanna S., Tan W., Why and where: A characterization of data provenance, Int. Conf. on Databases Theory (ICDT), pp. 316-330, (2001)
  • [9] (2002)
  • [10] Crawford M.J., Frey J.G., VanderNoot T.J., Zhao Y.G., Investigation of transport across an immiscible liquid/liquid interface- electrochemical and second harmonic generation studies, J. Chem. Soc., Faraday Trans, 92, 8, pp. 1369-1373, (1996)