Querying text databases for efficient information extraction

被引:23
作者
Agichtein, E [1 ]
Gravano, L [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2003年
关键词
D O I
10.1109/ICDE.2003.1260786
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adopt to new databases and domains. In this paper, we develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.
引用
收藏
页码:113 / 124
页数:12
相关论文
共 31 条
[1]  
AGICHTEIN E, 2001, P 10 WORLD WID WEB C
[2]  
AGICHTEIN E, 2000, P 5 ACM INT C DIG LI
[3]  
Brin Sergey, 1998, P 1998 INT WORKSH WE
[4]  
CHAKRABARTI S, 2002, P 11 WORLD WID WEB C
[5]  
COHEN W, 1996, P AAAI WORKSH INT BA
[6]  
COHEN W, 1995, INT C MACHINE LEARNI
[7]  
COWIE J, 1992, P 4 MESS UND C
[8]  
DAY D, 1997, P 5 ACL C APPL NAT L
[9]  
Flake G., 2002, P 11 WORLD WID WEB C
[10]  
GAIZAUSKAS R, 1997, P RIAO 97 COMP ASS I