Scalable reduction of large datasets to interesting subsets

被引:5
作者
Williams, Gregory Todd [1 ]
Weaver, Jesse [1 ]
Atre, Medha [1 ]
Hendler, James A. [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
来源
JOURNAL OF WEB SEMANTICS | 2010年 / 8卷 / 04期
关键词
Billion Triples Challenge; Scalability; Parallel; Inferencing; Query; Triplestore;
D O I
10.1016/j.websem.2010.08.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an "interesting" subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes. (C) 2010 Elsevier B. V. All rights reserved.
引用
收藏
页码:365 / 373
页数:9
相关论文
共 43 条
[41]  
Weaver J., 2009, P 5 INT WORKSH SCAL
[42]  
Weiss C., 2008, PVLDB
[43]  
WILKINSON B, 2005, PARALLEL PROGRAMMING, pCH3