GraphFind:: enhancing graph searching by low support data mining techniques

被引:14
作者
Ferro, Alfredo [1 ,2 ]
Giugno, Rosalba [1 ]
Mongiovi, Misael [1 ]
Pulvirenti, Alfredo [1 ]
Skripin, Dmitry [1 ]
Shasha, Dennis [3 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, I-95125 Catania, Italy
[2] Univ Catania, Dipartimento Sci Biomed, I-95125 Catania, Italy
[3] NYU, Courant Inst Math Sci, New York, NY 10012 USA
关键词
Hash Table; Query Time; Index Size; Graph Database; Query Graph;
D O I
10.1186/1471-2105-9-S4-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed. Results: This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining. Conclusions: GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.
引用
收藏
页数:10
相关论文
共 17 条
[1]  
[Anonymous], 22 INT C DAT ENG ICD
[2]  
CHENG J, 2007, P ACM SIGMOD INT C M, P857
[3]   Finding interesting associations without support pruning [J].
Cohen, E ;
Datar, M ;
Fujiwara, S ;
Gionis, A ;
Indyk, P ;
Motwani, R ;
Ullman, JD ;
Yang, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (01) :64-78
[4]   Substructure Discovery Using Minimum Description Length and Background Knowledge [J].
Cook, Diane J. ;
Holder, Lawrence B. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1993, 1 :231-255
[5]   A (sub)graph isomorphism algorithm for matching large graphs [J].
Cordella, LP ;
Foggia, P ;
Sansone, C ;
Vento, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (10) :1367-1372
[6]   NetMatch: a Cytoscape plugin for searching biological networks [J].
Ferro, A. ;
Giugno, R. ;
Pigola, G. ;
Pulvirenti, A. ;
Skripin, D. ;
Bader, G. D. ;
Shasha, D. .
BIOINFORMATICS, 2007, 23 (07) :910-912
[7]  
Foggia P., 2001, PROC 3ND WORKSHOP GR, P176
[8]  
Giugno R, 2002, INT C PATT RECOG, P112, DOI 10.1109/ICPR.2002.1048250
[9]  
Hu M, 2007, PROCEEDING IEEE 23 I, P966
[10]   A database for storage and fast retrieval of structure data: A demonstration [J].
Kumar, S ;
Srinivasa, S .
19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, :789-791