PAGE SEGMENTATION AND IDENTIFICATION FOR INTELLIGENT SIGNAL-PROCESSING

被引:9
作者
FAN, KC
WANG, LS
WANG, YK
机构
[1] Institute of Computer Science and Information Engineering, National Central University, Chung-Li
关键词
DOCUMENT ANALYSIS; ALTERNATIVE TREE REPRESENTATION; CONNECTIVITY HISTOGRAM; MULTIRESOLUTION FEATURES;
D O I
10.1016/0165-1684(95)00061-H
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Document analysis plays an important role in office automation, especially in intelligent signal processing. In this paper, we propose an intelligent document analysis system to achieve the document segmentation and identification goal. The proposed system consists of two modules: block segmentation and block identification. In our approach, we first segment a document into several non-overlapping blocks by utilizing a novel recursive segmentation technique, then extract the features embedded in each segmented block. Two kinds of features, connectivity histogram and multiresolution features, are extracted. The features are verified to be effective in characterizing document blocks. Last, a two-layer perceptron is adopted in the identification module to determine the identity of the considered block. Experiments with a wide varity of documents verify the feasibility of our approach.
引用
收藏
页码:329 / 346
页数:18
相关论文
共 19 条
[1]  
Abele, Wahl, Scherl, Procedures for an automatic-segmentation of text graphic and halftone regions in documents, Proc. 2nd Scandinavian Conf. on Image Analysis, (1981)
[2]  
Chen, Yan, A multiscaling approach based on morphological filtering, IEEE Trans. Pattern Anal. Machine Intell., 11, 7, pp. 694-700, (1989)
[3]  
Defense Advanced Projects Agency, Neural Network Society, (1988)
[4]  
Fan, Liu, Segmentation and classification of multimedia document, Proc. IEEE Internat. Workshop on Intelli. Signal Proces, and Commun. Systems, pp. 416-430, (1992)
[5]  
Fisher, Hinds, D'Amato, A rule-based system for document image segmentation, Proc. 10th ICPR, pp. 567-572, (1990)
[6]  
Gonzalez, Woods, Digital Image Processing, (1992)
[7]  
Haralick, Shapiro, Computer and Robot Vision, 2, (1992)
[8]  
Herwijnen, Practical SGML, (1990)
[9]  
Krishnamoorthy, Nagy, Seth, Viswanathan, Syntactic segmentation and labeling of digitized pages, IEEE Trans. Pattern Anal. Machine In-tell, 15, 7, pp. 737-747, (1993)
[10]  
Mallat, A theory for multiresolution signal de-composition: the wavelet representation, IEEE Trans. Pattern Anal. Machine Intell, 11, 7, pp. 674-693, (1989)