Stylus: A Strongly-Typed Store for Serving Massive RDF Data

被引:13
作者
He, Liang [1 ,2 ]
Shao, Bin [2 ]
Li, Yatao [2 ]
Xia, Huanhuan [2 ]
Xiao, Yanghua [3 ]
Chen, Enhong [1 ]
Chen, Liang Jeff [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Fudan Univ, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 11卷 / 02期
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
WEB DATA-MANAGEMENT; ENGINE; DATABASES; QUERIES;
D O I
10.14778/3149193.3149200
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
RDF is one of the most commonly used knowledge representation forms. Many highly influential knowledge bases, such as Freebase and PubChemRDF, are in RDF format. An RDF data set is usually represented as a collection of subject-predicate-object triples. Despite the flexibility of RDF triples, it is challenging to serve SPARQL queries on RDF data efficiently by directly managing triples due to the following two reasons. First, heavy joins on a large number of triples are needed for query processing, resulting in a large number of data scans and large redundant intermediate results; Second, weakly-typed triple representation provides suboptimal random access typically with logarithmic complexity. This data access challenge, unfortunately, cannot be easily met by a better query optimizer as large graph processing is extremely I/O-intensive. In this paper, we argue that strongly-typed graph representation is the key to high-performance RDF query processing. We propose Stylus a strongly-typed store for serving massive RDF data. Stylus exploits a strongly-typed storage scheme to boost the performance of RDF query processing. The storage scheme is essentially a materialized join view on entities, it thus can eliminate a large number of unnecessary joins on triples. Moreover, it is equipped with a compact representation for intermediate results and an efficient graph decomposition based query planner. Experimental results on both synthetic and real-life RDF data sets confirm that the proposed approach can dramatically boost the performance of SPARQL query processing.
引用
收藏
页码:203 / 216
页数:14
相关论文
共 40 条
[1]
SW-Store: a vertically partitioned DBMS for Semantic Web data management [J].
Abadi, Daniel J. ;
Marcus, Adam ;
Madden, Samuel R. ;
Hollenbach, Kate .
VLDB JOURNAL, 2009, 18 (02) :385-406
[2]
Abadi Daniel J., 2007, Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB'07, P411
[3]
Workload Matters: Why RDF Databases Need a New Design [J].
Aluc, Gunes ;
Ozsu, M. Tamer ;
Daudjee, Khuzaima .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (10) :837-840
[4]
Aluç G, 2014, LECT NOTES COMPUT SC, V8796, P197, DOI 10.1007/978-3-319-11964-9_13
[5]
Angles R., 2005, ESWC
[6]
[Anonymous], 2010, P 2010 C N AM CHAPT
[7]
Atre M., 2010, WWW
[8]
DBpedia - A crystallization point for the Web of Data [J].
Bizer, Christian ;
Lehmann, Jens ;
Kobilarov, Georgi ;
Auer, Soeren ;
Becker, Christian ;
Cyganiak, Richard ;
Hellmann, Sebastian .
JOURNAL OF WEB SEMANTICS, 2009, 7 (03) :154-165
[9]
Bollacker K., 2008, P 2008 ACM SIGMOD IN, P1247, DOI DOI 10.1145/1376616.1376746
[10]
Bornea M.A., 2013, SIGMOD