INTEGRATING NATURAL-LANGUAGE UNDERSTANDING WITH DOCUMENT STRUCTURE-ANALYSIS

被引:3
作者
TAYLOR, SL
DAHL, DA
LIPSHUTZ, M
WEIR, C
NORTON, LM
NILSON, RW
LINEBARGER, MC
机构
[1] Unisys Corporation, 19301, Pennsylvania, 70 E. Swedesford Road, Paoli
关键词
DOCUMENT ANALYSIS; NATURAL LANGUAGE PROCESSING; IMAGE PROCESSING; VISION; OPTICAL CHARACTER RECOGNITION;
D O I
10.1007/BF00849077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDFUS: text retrieval and automatic generation of hypertext links.
引用
收藏
页码:255 / 276
页数:22
相关论文
共 35 条
[21]  
PALLETT DS, 1991, P DARPA SPEECH NAT L, P49
[22]  
Palmer M., 1990, SEMANTIC PROCESSING
[23]  
PRICE P, 1990, P DARPA SPEECH NAT L, P91
[24]  
Ronse C, 1984, CONNECTED COMPONENTS
[25]  
Sager N., 1981, NATURAL LANGUAGE INF
[26]  
SCHWARTZ R, 1990, P DARPA WORKSH SPEEC, P6
[27]  
Soong F. K., 1990, P WORKSH SPEECH NAT, P12
[28]  
STRZALKOWSKI T, 1992, 13TH P ANN M ASS COM, P104
[29]  
TAYLOR SL, 1992, S DOC AN INF RETR LA, P58
[30]  
TAYLRT SL, 1993, 2 INT C DOC AN REC T, P107