Field data extraction for form document processing using a gravitation-based algorithm

被引:8
作者
Chen, JL [1 ]
Lee, HJ [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci & Informat Engn, Hsinchu 30050, Taiwan
关键词
form document processing; field-data grouping; gravitation-based algorithm; connected-component; locality property;
D O I
10.1016/S0031-3203(00)00115-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to grouping Chinese handwritten field data filled in form documents using a gravitation-based algorithm. An algorithm is developed to extract handwritten field data which may be written out of form fields. First, form lines are extracted and removed from input form images. Connected-components are then detected from remaining data, and the gravitation for each connected-component is computed by using the black pixel counts as their mass. Next, we move connected-components according to their gravitation. As generally known, filled-in data have the locality property, i.e., data of the same field are normally written in a local area consecutively. Therefore, the relationship of these connected-components can be determined by this property. Repeatedly moving these connected-components according to their neighbor components allows us to determine which connected-components should be extracted for a particular field. Experimental results demonstrate the effectiveness of the proposed method in grouping field data, (C) 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1741 / 1750
页数:10
相关论文
共 9 条
[1]  
Casey R., 1992, Machine Vision and Applications, V5, P143, DOI 10.1007/BF02626994
[2]   An efficient algorithm for form structure extraction using strip projection [J].
Chen, JL ;
Lee, HJ .
PATTERN RECOGNITION, 1998, 31 (09) :1353-1368
[3]   A robust algorithm for separation of Chinese characters from line drawings [J].
Chen, LH ;
Wang, JY ;
Liao, HY ;
Fan, KC .
IMAGE AND VISION COMPUTING, 1996, 14 (10) :753-761
[4]   EXTRACTION OF CHARACTERS FROM FORM DOCUMENTS BY FEATURE POINT CLUSTERING [J].
FAN, KC ;
LU, JM ;
WANG, LS ;
LIAO, HY .
PATTERN RECOGNITION LETTERS, 1995, 16 (09) :963-970
[5]   A ROBUST ALGORITHM FOR TEXT STRING SEPARATION FROM MIXED TEXT GRAPHICS IMAGES [J].
FLETCHER, LA ;
KASTURI, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (06) :910-918
[6]  
Niblack W., 1986, An introduction to image processing, P115
[7]   Automatic document processing: A survey [J].
Tang, YY ;
Lee, SW ;
Suen, CY .
PATTERN RECOGNITION, 1996, 29 (12) :1931-1952
[8]  
Taylor S. L., 1992, Machine Vision and Applications, V5, P211, DOI 10.1007/BF02626999
[9]   A generic system for form dropout [J].
Yu, B ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (11) :1127-1134