Statistical search on the semantic web

被引:21
作者
Kobayashi, Norio [1 ]
Toyoda, Tetsuro [1 ]
机构
[1] RIKEN, Genom Sci Ctr, Computat & Expt Syst Biol Grp, Integrat Om Res Team, Kanagawa 2300045, Japan
关键词
D O I
10.1093/bioinformatics/btn054
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: Statistical analysis of links on the Semantic Web is important for various evaluation purposes such as quantifying an individuals scientific research output based on citation links. SPARQL has been proposed as a standardized query language for the Semantic Web and is intuitively understandable; however, it does not adequately support statistical evaluation of semantic links. Results: We have extended SPARQL to a novel Resource Description Framework (RDF) query language termed General and Rapid Association Study Query Language (GRASQL) to generate inferences connecting semantic Boolean-based deduction and statistical evaluation of RDF resources. We have verified the descriptive capability of GRASQL by writing GRASQL queries for practical biomedical search patterns including in silico positional cloning studies and for ranking researchers in a specific domain of expertise by introducing k index, the number of papers containing specific keywords that are published in a fixed period by a researcher. We have also developed a search engine termed General and Rapid Association Study Engine (GRASE), which executes a restricted variety of GRASQL queries by requesting a dynamic and comprehensive evaluation of statistical significance of intersections between each group of documents assigned to URIs and those documents matching user-specified keywords and omics conditions. By performing practical in silico positional cloning searches with GRASE, we show the relevance of our approach on the Semantic Web for biomedical knowledge discovery problem solving.
引用
收藏
页码:1002 / 1010
页数:9
相关论文
共 24 条
[1]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]
The Semantic Web - A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities [J].
Berners-Lee, T ;
Hendler, J ;
Lassila, O .
SCIENTIFIC AMERICAN, 2001, 284 (05) :34-+
[3]
Chong EugeneInseok., 2005, VLDB, P1216
[4]
A bioinformatics-based strategy identifies c-Myc and Cdc25A as candidates for the Apmt mammary tumor latency modifiers [J].
Cozma, D ;
Lukes, L ;
Rouse, J ;
Qiu, TH ;
Liu, ET ;
Hunter, KW .
GENOME RESEARCH, 2002, 12 (06) :969-975
[5]
Ganiz M.C., 2005, LUCSE05027
[6]
The Gene Ontology (GO) project in 2006 [J].
Harris, Midori A. ;
Clark, Jennifer I. ;
Ireland, Amelia ;
Lomax, Jane ;
Ashburner, Michael ;
Collins, Russell ;
Eilbeck, Karen ;
Lewis, Suzanna ;
Mungall, Chris ;
Richter, John ;
Rubin, Gerald M. ;
Shu, ShengQiang ;
Blake, Judith A. ;
Bult, Carol J. ;
Diehl, Alexander D. ;
Dolan, Mary E. ;
Drabkin, Harold J. ;
Eppig, Janan T. ;
Hill, David P. ;
Ni, Li ;
Ringwald, Martin ;
Balakrishnan, Rama ;
Binkley, Gail ;
Cherry, J. Michael ;
Christie, Karen R. ;
Costanzo, Maria C. ;
Dong, Qing ;
Engel, Stacia R. ;
Fisk, Dianna G. ;
Hirschman, Jodi E. ;
Hitz, Benjamin C. ;
Hong, Eurie L. ;
Lane, Christopher ;
Miyasato, Stuart ;
Nash, Robert ;
Sethuraman, Anand ;
Skrzypek, Marek ;
Theesfeld, Chandra L. ;
Weng, Shuai ;
Botstein, David ;
Dolinski, Kara ;
Oughtred, Rose ;
Berardini, Tanya ;
Mundodi, Suparna ;
Rhee, Seung Y. ;
Apweiler, Rolf ;
Barrell, Daniel ;
Camon, Evelyn ;
Dimmer, Emily ;
Mulder, Nicola .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D322-D326
[7]
RIKEN mouse genome encyclopedia [J].
Hayashizaki, Y .
MECHANISMS OF AGEING AND DEVELOPMENT, 2003, 124 (01) :93-102
[8]
TraitMap: an XML-based genetic-map database combining multigenic loci and biomolecular networks [J].
Heida, Naohiko ;
Hasegawa, Yoshikazu ;
Mochizuki, Yoshiki ;
Hirosawa, Katsura ;
Konagaya, Akihiko ;
Toyoda, Tetsuro .
BIOINFORMATICS, 2004, 20 :152-160
[9]
An index to quantify an individual's scientific research output [J].
Hirsch, JE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (46) :16569-16572
[10]
Using literature-based discovery to identify disease candidate genes [J].
Hristovski, D ;
Peterlin, B ;
Mitchell, JA ;
Humphrey, SM .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2005, 74 (2-4) :289-298