A journey to Semantic Web query federation in the life sciences

被引:29
作者
Cheung, Kei-Hoi [1 ]
Frost, H. Robert [2 ]
Marshall, M. Scott [3 ]
Prud'hommeaux, Eric [4 ]
Samwald, Matthias [5 ,6 ]
Zhao, Jun [7 ]
Paschke, Adrian [8 ]
机构
[1] Yale Univ, Sch Med, Ctr Med Informat, New Haven, CT 06511 USA
[2] VectorC LLC, Hanover, NH 03755 USA
[3] Univ Amsterdam, Inst Informat, NL-1012 WX Amsterdam, Netherlands
[4] MIT, World Wide Web Consortium, Cambridge, MA 02139 USA
[5] Natl Univ Ireland Galway, Digital Enterprise Res Inst, Galway, Ireland
[6] Konrad Lorenz Inst Evolut & Cognit Res, Altenberg, Austria
[7] Univ Oxford, Dept Zool, Oxford OX1 3PS, England
[8] Free Univ Berlin, Berlin, Germany
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
爱尔兰科学基金会; 英国工程与自然科学研究理事会;
关键词
BIOINFORMATICS; ACCESS; SYSTEM;
D O I
10.1186/1471-2105-10-S10-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results: We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion: We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
引用
收藏
页数:16
相关论文
共 28 条
  • [21] PASCHKE A, 2008, P 3 INT C PRAGM WEB, P59
  • [22] RUTTENBERG A, 2009, BRIEFINGS BIOINFORMA
  • [23] Atlas - a data warehouse for integrative bioinformatics
    Shah, SP
    Huang, Y
    Xu, T
    Yuen, MMS
    Ling, J
    Ouellette, BFF
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [24] Integrating biological databases
    Stein, LD
    [J]. NATURE REVIEWS GENETICS, 2003, 4 (05) : 337 - 345
  • [25] The EBI SRS server - recent developments
    Zdobnov, EM
    Lopez, R
    Apweiler, R
    Etzold, T
    [J]. BIOINFORMATICS, 2002, 18 (02) : 368 - 373
  • [26] ZIRN C, 2008, SEMANTIC WEB RES APP
  • [27] ZEMANTA ZEMANTA CONT
  • [28] RDF STATEMENTS PARSE