Building intelligent Web applications using lightweight wrappers

被引:317
作者
Sahuguet, A
Azavant, F
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Ecole Natl Super Telecommun, F-75634 Paris 13, France
关键词
Web; XML; information extraction; wrappers;
D O I
10.1016/S0169-023X(00)00051-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to Various data formats like XML; (3) some Visual tools to make the engineering of wrappers faster and easier. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:283 / 316
页数:34
相关论文
共 30 条
[1]
ABITEBOUL S, 1997, J DIGITAL LIB
[2]
ADELBERG B, 1998, P SIGMOD C SEATTL JU
[3]
ALLEN C, 1997, WORLD WIDE WEB J, V2
[4]
AROCENA G, 1998, P ICDE 98 ORL FEBR
[5]
ASHISH N, 1997, P 2 IFCIS C COOP INF
[6]
AZAVANTI F, 2000, W4F USER MANUAL
[7]
CATTELL R, 1997, OBJECT DATABASE STAN
[8]
CHRISTOPHIDES V, 1996, THESIS CONSERVATOIRE
[9]
CLUET S, 1998, P SIGMOD C SEATTL
[10]
Deutsch A., 1998, XML QL QUERY LANGUAG