Learning information extraction rules for semi-structured and free text

被引:947
作者
Soderland, S [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
natural language processing; information extraction; rule learning;
D O I
10.1023/A:1007562322031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
A wealth of on-line text information can be made available to automatic processing by information extraction (IE) systems. Each IE application needs a separate set of rules tuned to the domain and writing style. WHISK helps to overcome this knowledge-engineering bottleneck by learning text extraction rules automatically. WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences. Such semi-structured text has largely been beyond the scope of previous systems. When used in conjunction with a syntactic analyzer and semantic tagging, WHISK can also handle extraction from free text such as news stories.
引用
收藏
页码:233 / 272
页数:40
相关论文
共 23 条
[1]
[Anonymous], 1994, SIGIR
[2]
Ashish N., 1997, SIGMOD Record, V26, P8, DOI 10.1145/271074.271078
[3]
Califf Mary Elaine, 1997, P CONLL, P9
[4]
Cohen WW, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P709
[5]
IMPROVING GENERALIZATION WITH ACTIVE LEARNING [J].
COHN, D ;
ATLAS, L ;
LADNER, R .
MACHINE LEARNING, 1994, 15 (02) :201-221
[6]
DAGAN I, 1996, CONNECTIONIST STAT S
[7]
DOMINGOS P, 1994, PROC INT C TOOLS ART, P704, DOI 10.1109/TAI.1994.346421
[8]
FREITAG D, 1998, P 15 INT C MACH LEAR, P161
[9]
Friedman JH., 1984, BIOMETRICS, V40, P874, DOI [DOI 10.2307/2530946, 10.2307/2530946]
[10]
HUFFMAN S, 1996, CONNECTIONIST STAT S