UpSizeR: Synthetically scaling an empirical relational database

被引:17
作者
Tay, Y. C. [1 ]
Dai, Bing Tian [2 ]
Wang, Daniel T. [3 ]
Sun, Eldora Y. [1 ]
Lin, Yong [1 ]
Lin, Yuting [1 ]
机构
[1] Natl Univ Singapore, Singapore 117548, Singapore
[2] Singapore Management Univ, Singapore 178902, Singapore
[3] Teradata, Shanghai, Peoples R China
关键词
Application-specific benchmarking; Synthetic data generation; Scale factor; Empirical dataset; Attribute value correlation; Social networks;
D O I
10.1016/j.is.2013.07.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
The TPC benchmarks have helped users evaluate database system performance at different scales. Although each benchmark is domain-specific, it is not equally relevant to different applications in the same domain. The present proliferation of applications also leaves many of them uncovered by the very limited number of current TPC benchmarks. There is therefore a need to develop tools for application-specific database benchmarking. This paper presents UpSizeR, a software that addresses the Dataset Scaling Problem: Given an empirical set of relational tables D and a scale factor s, generate a database state (D) over tilde, that is similar to D but s times its size. Such a tool can be useful for scaling up 7, for scalability testing (s > 1), scaling down for application testing (s < 1), or anonymization (s = 1). Experiments with Flickr show that query results and response times on UpSizeR output match those on crawled data. They also accurately predict throughput degradation for a scale out test. The UpSizeR version in this paper focuses on extracting and replicating the correlation induced by the primary and foreign keys. There are many other forms of correlation involving non-key values. It is a large task to develop UpSizeR into a tool that can extract and replicate all important correlation, so community effort is required. The current UpSizeR code has therefore been released for open-source development. The ultimate objective is to replace TPC with UpSizeR, so database owners can generate benchmarks that are relevant to their applications. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1168 / 1183
页数:16
相关论文
共 32 条
[1]
[Anonymous], 2004, SIGMOD
[2]
[Anonymous], 2009, P 2 INT WORKSH TEST
[3]
[Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[4]
[Anonymous], 2007, P INT C DAT ENG, DOI DOI 10.1109/ICDE.2007.367896
[5]
[Anonymous], 2005, P INT C VER LARG DAT
[6]
Arasu A., 2004, P 30 INT C VER LARG, P480
[7]
Beaver Doug, 2010, OSDI, V10, P1
[8]
Binnig Carsten, 2007, P ACM SIGMOD INT C M, P341
[9]
Birman K., 2009, SIGACTNews, V40, P68, DOI DOI 10.1145/1556154.1556172
[10]
A measure of similarity between graph vertices: Applications to synonym extraction and web searching [J].
Blondel, VD ;
Gajardo, A ;
Heymans, M ;
Senellart, P ;
Van Dooren, P .
SIAM REVIEW, 2004, 46 (04) :647-666