ANATOMY OF A VERSATILE PAGE READER

被引:30
作者
BAIRD, HS
机构
[1] AT&T Bell Laboratories, Murray Hill, NJ
关键词
OPTICAL CHARACTER RECOGNITION; PAGE READING; GEOMETRIC LAYOUT ANALYSIS; SYMBOL RECOGNITION; CONTEXTUAL ANALYSIS; COMPUTATIONAL LINGUISTICS;
D O I
10.1109/5.156469
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
An experimental printed-page reader that is easy to adapt to various languages is described Changing the target language may involve simultaneous changes in symbol sets, typefaces, sizes of text, page layouts linguistic contexts, and imaging defects. Our strategy has been to isolate the effects of these sources of variation within separate, independent engineering subsystems. In this way, we have been able to construct, with a minimum of manual effort, classifiers for arbitrary combinations of symbols, typefaces, sizes, and imaging defects. We have tried to rid the algorithms of all language-specific rules, relying instead on automatic learning from examples and generalized table-driven methods. For some tasks we have been able to avoid language-dependency altogether: for example, for geometric page layout analysis we have found a global-to-local strategy that requires no prior knowledge of the symbol set. We can exploit linguistic context, such as provided by dictionaries, through data-directed filtering algorithms in a uniform and modular manner, so that pre-existing tools developed by computational linguists can readily be applied. We illustrate these principals through trials on English, Swedish, Tibetan, and special technical texts.
引用
收藏
页码:1059 / 1065
页数:7
相关论文
共 14 条
[1]  
[Anonymous], 1982, CHICAGO MANUAL STYLE, V13th
[2]  
BAIRD H, 1988, SEP P IAPR WORKSH SY
[3]   FEATURE IDENTIFICATION FOR HYBRID STRUCTURAL STATISTICAL PATTERN-CLASSIFICATION [J].
BAIRD, HS .
COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 42 (03) :318-333
[4]   READING CHESS [J].
BAIRD, HS ;
THOMPSON, K .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (06) :552-559
[5]  
BAIRD HS, 1988, AUG P COST13 WORKSH
[6]  
BAIRD HS, 1987, 5TH P SCAND C IM AN
[7]  
BAIRD HS, 1990, NOV P IAPR WORKSH MA
[8]  
BAIRD HS, 1987, 1987 P C SOC PHOT SC
[9]  
BAIRD HS, 1991, 1ST P INT C DOC AN R
[10]  
BIARD HS, 1992, IN PRESS STRUCTURED