Content-lossless document image compression based on structural analysis and pattern matching

被引:11
作者
Yang, YB [1 ]
Yan, H [1 ]
Yu, DG [1 ]
机构
[1] Univ Sydney, Dept Elect Engn, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
document image analysis; structural clustering; pattern matching; image compression;
D O I
10.1016/S0031-3203(99)00112-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This payer presents a highly efficient content-lossless document image compression scheme. The method consists of three stages. Firstly, the image is analysed and segmented into symbols and position parameters by analysing the relation of the foreground to background and their connectivity. Secondly, the initial representative symbol set from symbols in the image is extracted and matched by direction-based bit-map analysis and matching, and the final representative and synthetic pattern set with less-repeated symbol is formed from the previous symbol set by multi-stage structure clustering and representative pattern deriving and synthesis. This final component set is reorganized into a compact library image. Finally, high ratio compression is achieved by coding relative positions of symbols, parameters of representative patterns and the library image using the adaptive arithmetic coder with different orders and the Q-Coder, respectively. Our scheme achieves much better compression and less error-map than most of alternative systems. Its lossiness can be reduced to a quite small level in a well-defined pattern deriving and synthesis manner compromising compression ratio. Our method can assure content-lossless reconstruction in our symbol-level content-lossless criteria. The method can be easily combined with soft pattern matching to extend to lossless mode. In addition, combining this method with the JBIG1 progressive mode with less-redundancy component library can achieve content-lossless progressive transmission capability. Our method can also be used to deal with various symbolic images including nested symbols like Chinese character images by means of symbolic segmentation based on only connection and position-based bit-map reconstruction. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1277 / 1293
页数:17
相关论文
共 21 条
[1]   MEANS FOR ACHIEVING A HIGH DEGREE OF COMPACTION ON SCAN-DIGITIZED PRINTED TEXT [J].
ASCHER, RN ;
NAGY, G .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (11) :1174-1179
[2]   WORD AUTO-CORRELATION REDUNDANCY MATCH (WARM) TECHNOLOGY [J].
BRICKMAN, NF ;
ROSENBAUM, WS .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1982, 26 (06) :681-686
[3]  
*CART PROD INC, 1997, TECHN OV CART PERC C
[4]  
HOLT MJ, 1986, ICL TECH J, V5, P123
[5]   Text image compression using soft pattern matching [J].
Howard, PG .
COMPUTER JOURNAL, 1997, 40 (2-3) :146-156
[6]   ARITHMETIC CODING FOR DATA-COMPRESSION [J].
HOWARD, PG ;
VITTER, JS .
PROCEEDINGS OF THE IEEE, 1994, 82 (06) :857-865
[7]  
INGLIS S, 1994, P IEEE DAT COMPR C L
[8]  
JOHNSEN O, 1983, AT&T TECH J, V62, P2513
[9]   AN INTRODUCTION TO ARITHMETIC CODING [J].
LANGDON, GG .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1984, 28 (02) :135-149
[10]  
McConnell K. R., 1989, FAX DIGITAL FACSIMIL