SEGMENTATION AND CLASSIFICATION OF MIXED TEXT/GRAPHICS/IMAGE DOCUMENTS

被引:34
作者
FAN, KC
LIU, CH
WANG, YK
机构
[1] Institute of Computer Science and Electronic Engineering, National Central University, Chung-Li
关键词
DOCUMENT SEGMENTATION; BLOCK CLASSIFICATION; PROJECTION FEATURE; CONNECTIVITY HISTOGRAM;
D O I
10.1016/0167-8655(94)90110-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a feature-based document analysis system is presented which utilizes domain knowledge to segment and classify mixed text/graphics/image documents. In our approach, we first perform a run-length smearing operation followed by the stripe merging procedure to segment the blocks embedded in a document. The classification task is then performed based on the domain knowledge induced from the primitives associated with each type of medium. Proper use of domain knowledge is proved to be effective in accelerating the segmentation speed and decreasing the classification error. The experimental study reveals the feasibility of the new technique in segmenting and classifying mixed text/graphics/image documents.
引用
收藏
页码:1201 / 1209
页数:9
相关论文
共 7 条
[1]  
Nagy G., 1984, Seventh International Conference on Pattern Recognition (Cat. No. 84CH2046-1), P347
[2]   PAGE SEGMENTATION AND CLASSIFICATION [J].
PAVLIDIS, T ;
ZHOU, JY .
CVGIP-GRAPHICAL MODELS AND IMAGE PROCESSING, 1992, 54 (06) :484-496
[3]  
ROSENFELD A, 1982, DIGITAL PICTURE PROC, V2
[4]  
Srihari S. N., 1986, Eighth International Conference on Pattern Recognition. Proceedings (Cat. No.86CH2342-4), P434
[5]  
STEPHEN WL, 1990, 10TH P INT C PATT RE, P703
[6]  
TOYODA J, 1982, 6TH P INT C PATT REC, P1113
[7]  
WONG KY, 1982, IBM J RES DEV, V6, P642