A script-independent methodology for optical character recognition

被引:30
作者
Makhoul, J [1 ]
Schwartz, R [1 ]
Lapre, C [1 ]
Bazzi, I [1 ]
机构
[1] BBN Syst & Technol Corp, GTE Internetworking, Cambridge, MA 02138 USA
关键词
optical character recognition; speech recognition; hidden Markov models; segmentation-free recognition; script independence; Arabic OCR;
D O I
10.1016/S0031-3203(97)00152-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a methodology for OCR that exhibits the following properties: script-independent feature extraction, training, and recognition components; no separate segmentation at the character and word levels; and the training is performed automatically on data that is also not presegmented. The methodology is adapted to OCR from continuous speech recognition, which has developed a mature and successful technology based on Hidden Markov Models. The script independence of the methodology is demonstrated using omnifont experiments on the DARPA Arabic OCR Corpus and the University of Washington English Document Image Database I. (C) 1998 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1285 / 1294
页数:10
相关论文
共 36 条
[1]   Text page recognition using grey-level features and hidden markov models [J].
Aas, K ;
Eikvil, L .
PATTERN RECOGNITION, 1996, 29 (06) :977-985
[2]   HIDDEN MARKOV MODEL-BASED OPTICAL CHARACTER-RECOGNITION IN THE PRESENCE OF DETERMINISTIC TRANSFORMATIONS [J].
AGAZZI, OE ;
KUO, SS .
PATTERN RECOGNITION, 1993, 26 (12) :1813-1826
[3]   SURVEY AND BIBLIOGRAPHY OF ARABIC OPTICAL TEXT RECOGNITION [J].
ALBADR, B ;
MAHMOUD, SA .
SIGNAL PROCESSING, 1995, 41 (01) :49-77
[4]  
ALLAM M, 1995, P SOC PHOTO-OPT INS, V2422, P228, DOI 10.1117/12.205825
[5]   Hidden Markov models in text recognition [J].
Anigbogu, JC ;
Belaid, A .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1995, 9 (06) :925-958
[6]  
BELLEGARDA J, 1989, IEEE INT C AC SPEECH, V1, P13
[7]  
BENAMARA N, 1996, 13 INT C PATT REC VI, V2, P220
[8]   CONNECTED AND DEGRADED TEXT RECOGNITION USING HIDDEN MARKOV MODEL [J].
BOSE, CB ;
KUO, SS .
PATTERN RECOGNITION, 1994, 27 (10) :1345-1363
[9]   OFF-LINE CURSIVE HANDWRITING RECOGNITION USING HIDDEN MARKOV-MODELS [J].
BUNKE, H ;
ROTH, M ;
SCHUKATTALAMAZZINI, EG .
PATTERN RECOGNITION, 1995, 28 (09) :1399-1413
[10]   A survey of methods and strategies in character segmentation [J].
Casey, RG ;
Lecolinet, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (07) :690-706