多语种投资信息抽取系统的实现

被引：3

作者：

李芳

盛焕烨

张冬茉

机构：

[1] 上海交通大学计算机科学与工程系

来源：

上海交通大学学报 | 2004年 / 01期

关键词：

模板生成; 信息抽取; 多语种信息抽取; Internet应用;

D O I：

10.16183/j.cnki.jsjtu.2004.01.006

中图分类号：

TP311.52 [];

学科分类号：

摘要：

多语种投资信息抽取实验系统可以用中文、英文和德文的关键字或限定的自然语言问句查询语料库中的中文投资信息.它由语言处理模块、查询处理模块、信息抽取核心和动态交互获取模块组成.其主要特点:基于语种无关的模板和与语种有关的模式,实现不同语种信息抽取处理的一致性;事先定义的抽取模板结合动态获取的模板,来弥补信息抽取技术依赖于固定模板的缺陷,增加系统的健壮性.系统抽取的各个槽的平均精度为86.27%.动态获取模板的精度为85.27%,如果人工对约5个例句修改,动态交互获取模板的精度可达88.55%,提高了3%左右.

引用

页码：21 / 25

页数：5

共 7 条

[1] Learning to extract text -based information from the world wide Web. Soderland S. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-9 7) . 1997
[2] Extracting causal knowledge from a medical database using graphical patterns. Christopher S G,Khoo Syin,Chan Y N. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics . 2000
[3] Deriving a multi -domain information extraction system from a rough ontology. Thierry Poibeau. IJCAI 2001[C] . 2001
[4] Information extraction from html : application of a general learning approach. Freitag D. Proceedings of the 15th Conference on Artificial Intelligence(AAAI -9 8) . 1998
[5] WAVE: An incremental algorithm for information extraction. Jonathan H Aseltine. Proceedings of the AAAI Workshop on Machine Learning for Information Extraction[C] . 1999
[6] IEPAD: Information extraction based on pattern discovery. Chang C H,Lui S C. WWW10[C] . 2001
[7] Relational learning of patternmatch rules for information extraction. Califf M,Mooney R. Proceedings of the ACL-9 7 Workshop in Natural Language Learning . 1997

← 1 →