Counting twig matches in a tree

被引:32
作者
Chen, ZY [1 ]
Jagadish, HV [1 ]
Korn, F [1 ]
Koudas, N [1 ]
Muthukrishnan, S [1 ]
Ng, R [1 ]
Srivastava, D [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
来源
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDE.2001.914874
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e., a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e., small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: Set Hashing, used to estimate the number of matches of each query twiglet, and Maximal Overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.
引用
收藏
页码:595 / 604
页数:10
相关论文
共 15 条
[1]  
Bray Tim, 1998, Extensible markup language
[2]  
Broder A. Z., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P327, DOI 10.1145/276698.276781
[3]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[4]  
CHEN Z, 2000, P ACM S PRINC DAT SY
[5]   Size-estimation framework with applications to transitive closure and reachability [J].
Cohen, E .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (03) :441-453
[6]  
COHEN E, 2000, IN PRESS P 16 ANN IE
[7]  
Deutsch A., 1998, XML QL QUERY LANGUAG
[8]  
Howes T., 1999, UNDERSTANDING DEPLOY
[9]  
JAGADISH HV, 1999, P INT C VER LARG DAT
[10]  
JAGADISH HV, 1999, P ACM S PRINC DAT SY