INTEGRATING NATURAL-LANGUAGE UNDERSTANDING WITH DOCUMENT STRUCTURE-ANALYSIS

被引:3
作者
TAYLOR, SL
DAHL, DA
LIPSHUTZ, M
WEIR, C
NORTON, LM
NILSON, RW
LINEBARGER, MC
机构
[1] Unisys Corporation, 19301, Pennsylvania, 70 E. Swedesford Road, Paoli
关键词
DOCUMENT ANALYSIS; NATURAL LANGUAGE PROCESSING; IMAGE PROCESSING; VISION; OPTICAL CHARACTER RECOGNITION;
D O I
10.1007/BF00849077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDFUS: text retrieval and automatic generation of hypertext links.
引用
收藏
页码:255 / 276
页数:22
相关论文
共 35 条
[1]  
BALL CN, 1989, P DARPA SPEECH LANGU, P60
[2]  
Church K. W., 1988, Second Conference on Applied Natural Language Processing, P136
[3]  
DAHL DA, 1993, HYPOTHESIZING CASE F
[4]  
DAHL DA, 1993, 2ND S DOC ANAL RETR, P169
[5]  
DAHL DA, 1990, P DARPA SPEECH LANGU, P212
[6]  
DAHL DA, 1990, LOGIC LOGIC GRAMMARS
[7]  
FILLMORE CJ, 1977, KASUSTHEORIE KLASSIF
[8]  
FISHER J, 1991, 1ST P INT C DOC ANAL, P302
[9]  
HEMPHILL CT, 1990, P DARPA SPEECH LANGU
[10]  
HINDS S, 1990, 10TH P INT C PATT RE, P464