A practical algorithm to find the best subsequence patterns

被引:6
作者
Hirao, M [1 ]
Hoshino, H [1 ]
Shinohara, A [1 ]
Takeda, M [1 ]
Arikawa, S [1 ]
机构
[1] Kyushu Univ 33, Dept Informat, Fukuoka 8128581, Japan
关键词
optimal pattern discovery; subsequence; subsequence automata;
D O I
10.1016/S0304-3975(02)00182-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. We regard it to find a subsequence pattern which separates these two sets. The problem is known to be NP-complete. We naturally generalize it to an optimization problem, where we try to find a subsequence pattern which maximally separates these two sets. We provide a practical algorithm to solve it exactly. Our algorithm uses two pruning heuristics based on the properties of subsequence languages, and utilizes the data structure called subsequence automata. We report some experimental results, which show these heuristics and the data structure contribute to reduce the search time. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:465 / 479
页数:15
相关论文
共 22 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]   FINDING PATTERNS COMMON TO A SET OF STRINGS [J].
ANGLUIN, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1980, 21 (01) :46-62
[3]  
[Anonymous], 2000, Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS'00)
[4]  
[Anonymous], 1995, P 1 SIGKDD INT C KNO
[5]  
[Anonymous], P ALT 91
[6]   SEARCHING SUBSEQUENCES [J].
BAEZAYATES, RA .
THEORETICAL COMPUTER SCIENCE, 1991, 78 (02) :363-376
[7]  
CALIFANO A, SPLASH STRUCTURAL PA
[8]  
CROCHEMORE M, 1999, IGM9913
[9]  
Das G, 1997, LECT NOTES COMPUT SC, V1264, P12
[10]  
Feldman R., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P167