Lineage tracing for general data warehouse transformations

被引:111
作者
Cui, YW [1 ]
Widom, J [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
data lineage; data warehouse; transformation; lineage tracing; inverse;
D O I
10.1007/s00778-002-0083-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data warehousing systems integrate information from operational data sources into a central repository to enable analysis and mining of the integrated information. During the integration process, source data typically undergoes a series of transformations, which may vary from simple algebraic operations or aggregations to complex "data cleansing" procedures. In a warehousing environment, the data lineage problem is that of tracing warehouse data items back to the original source items from which they were derived. We formally define the lineage tracing problem in the presence of general data warehouse transformations, and we present algorithms for lineage tracing in this environment. Our tracing procedures take advantage of known structure or properties of transformations when present, but also work in the absence of such information. Our results can be used as the basis for a lineage tracing tool in a general warehousing setting, and also can guide the design of data warehouses that enable efficient lineage tracing.
引用
收藏
页码:41 / 58
页数:18
相关论文
共 32 条
[1]  
ABITEBOUL S, 1999, IEEE DATA ENG B, V22, P3
[2]  
[Anonymous], MICROSOFT SQL SERVER
[3]  
BERNSTEIN PA, 1999, IEEE DATA ENG B, V22, P9
[4]  
Buneman P, 1995, P INT C VER LARG DAT, P158
[5]  
Chaudhuri S., 1997, SIGMOD Record, V26, P65, DOI 10.1145/248603.248616
[6]  
CLAYPOOL KT, 1999, IEEE DATA ENG B, V22, P19
[7]  
Cui Y., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P367, DOI 10.1109/ICDE.2000.839437
[8]  
CUI Y, 2001, P 27 INT C VER LARG, P471
[9]  
Cui Y., 2001, Run-time translation of view tuple deletions using data lineage
[10]  
CUI Y, 2001, THESIS STANFORD U CA