Measuring XML document similarity: a case study for evaluating information extraction systems

被引:30
作者
Canfora, G [1 ]
Cerulo, L [1 ]
Scognamiglio, R [1 ]
机构
[1] Univ Sannio, Dept Engn, Res Ctr Software Technol, RCOST, I-82100 Benevento, Italy
来源
10TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS, PROCEEDINGS | 2004年
关键词
D O I
10.1109/METRIC.2004.1357889
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Measuring similarity between trees, such as XML structured information, has an important role in many applications, and in particular in the evaluation of the effectiveness of Information Extraction Systems (IES). In this paper we present an experience in evaluating the effectiveness of IES in terms of extraction and adaptation effectiveness. In the first part of the paper a similarity measure between XML trees based on a common sub tree detection algorithm is introduced; then, a case study aimed at the evaluation of the effectiveness of a group of IES is presented as an example of application.
引用
收藏
页码:36 / 45
页数:10
相关论文
共 15 条
[1]  
Bunke H, 2000, INT C PATT RECOG, P117, DOI 10.1109/ICPR.2000.906030
[2]  
BUNKE H, 1997, PRL PATTERN RECOGNIT, V18
[3]  
CANFORA G, 2004, 8 EUR C SOFTW MAINT
[4]  
Chawathe S. S., 1996, SIGMOD Record, V25, P493, DOI 10.1145/235968.233366
[5]   Information extraction [J].
Cowie, J ;
Lehnert, W .
COMMUNICATIONS OF THE ACM, 1996, 39 (01) :80-91
[6]  
CRESCENZI V, 2002, P 2002 ACM SIGMOD IN, P624
[7]  
GRISHMAN, 1999, P 7 MESS UND C MUC 7
[8]   DEByE - Data extraction by example [J].
Laender, AHF ;
Ribeiro-Neto, B ;
da Silva, AS .
DATA & KNOWLEDGE ENGINEERING, 2002, 40 (02) :121-154
[9]  
Liu L., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P611, DOI 10.1109/ICDE.2000.839475
[10]   An O(ND) Difference Algorithm and Its Variations [J].
Myers, Eugene W. .
ALGORITHMICA, 1986, 1 (1-4) :251-266