Grammars have exceptions

被引:66
作者
Crescenzi, V
Mecca, G
机构
[1] Univ Roma Tre, Dipartimento Informat & Automaz, I-00146 Rome, Italy
[2] Univ Basilicata, DIFA, I-85100 Potenza, Italy
关键词
wrappers; grammars; exceptions; documents; Web;
D O I
10.1016/S0306-4379(98)00028-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Extending database-like techniques to semi-structured and Web data sources is becoming a prominent research field. These data sources are essentially collections of textual documents. Hence, in this context, one of the key tasks consists in wrapping documents to build database abstractions of their content that can be manipulated using high-level tools. However, the degree of heterogeneity and the lack of structure make standard grammar parsers excessively rigid, and often unable to capture the richness of constructs in these documents. This paper presents MINERVA, a. formalism for writing wrappers around Web sites and other textual data sources. The key feature of MINERVA is the attempt to couple the benefits of a declarative, grammar-based approach, with the flexibility of procedural programming. This is done by enriching regular grammars with an explicit exception-handling mechanism. Contributions of the paper stand in the definition of the formalism, and in the description of its implementation, which relies on a number of ad-hoc techniques for parsing documents, among which an extension of the traditional LL(1) policy based on dynamic tokenization. (C)1998 Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:539 / 565
页数:27
相关论文
共 40 条
[1]  
ABITEBOUL S, 1997, J DIGITAL LIB, V1, P5
[2]  
ABITEBOUL S, 1993, INT C VER LARG DAT B, P73
[3]  
AHO AV, 1985, COMPILERS PRINCIPLES
[4]   WebOQL: Restructuring documents, databases and Webs [J].
Arocena, GO ;
Mendelzon, AO .
14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, :24-33
[5]  
ASHISH N, 1997, P WORKSH MAN SEM DAT
[6]  
ATZENI P, 1997, 16 ACM SIGMOD INT S, P144
[7]  
ATZENI P, 1998, 6 INT C EXT DAT TECH, P436
[8]  
ATZENI P, 1997, INT C VER LARG DAT B, P206
[9]  
BEERI C, 1998, P WORKSH WEB DAT WEB
[10]  
BLAKE GE, 1994, LECT NOTES COMPUTER, V819, P267