Web信息抽取技术研究进展

被引：20

作者：

陈少飞

郝亚南

李天柱

徐林昊

杨文柱

机构：

[1] 河北大学数学与计算机学院

[2] 河北大学数学与计算机学院河北保定

[3] 河北保定

来源：

河北大学学报(自然科学版) | 2003年 / 01期

关键词：

HTML; XML; 语义; 规则; 信息抽取;

D O I：

暂无

中图分类号：

TP393.09 [];

学科分类号：

080402 ;

摘要：

Web信息抽取技术是当今的一个研究热点。目前出现了基于不同原理的多种信息抽取技术,它们具有不同的性能。本文根据信息抽取的原理,对现有的信息抽取技术进行了分类,结合典型的系统,在语义的附加方式、模式的定义方式、规则的表现形式、语义项的定位方式、对象的定位方式等几方面进行了分析和比较,在此基础上提出了待研究的问题。

引用

页码：106 / 112

页数：7

共 11 条

[1] 基于样本实例的Web信息抽取 [J].

张绍华 ;

徐林昊 ;

杨文柱 ;

薛文玲 ;

李天柱 .

河北大学学报(自然科学版), 2001, (04) :431-437

[2] 从WEB文档中构造半结构化信息的抽取器 [J].

黄豫清 ;

戚广志 ;

张福炎 .

软件学报, 2000, (01) :73-78

[3] Building intelligent Web applications using lightweight wrappers [J].

Sahuguet, A ;

Azavant, F .

DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) :283-316

[4]

Hierarchical Wrapper Induction for Semistructured Information Sources[J] . Ion Muslea,Steven Minton,Craig A. Knoblock.Autonomous Agents and Multi-Agent Systems . 2001 (1)

[5]

Wrapper induction: Efficiency and expressiveness[J] . Nicholas Kushmerick.Artificial Intelligence . 2000 (1)

[6] Machine learning for information extraction in informal domains [J].

Freitag, Dayne .

Machine Learning, 2000, 39 (02) :169-202

[7] Learning Information Extraction Rules for Semi-Structured and Free Text [J].

Stephen Soderland .

Machine Learning, 1999, 34 :233-272

[8]

Web Ecology:Recycling HTML Pages as XML Documents Using W4F .2 Arnaud S,Fabien A. Second Intl.Workshop on the Web and Databases . 1999

[9]

Learning Hidden Markov Model Structure for Information Extraction .2 Kristie Seymore,Andrew McCallum,Ronald Rosenfeld. Working Notes of the AAAI Workshop on Machine Learning for Information Extraction . 1999

[10]

Do HTML tags flag semantic content? .2 JONATHAN HODGSON. IEEE Internet Computing . 2001

← 1 2 →