The IAM-database: An English sentence database for offline handwriting recognition

被引:808
作者
U.-V. Marti
H. Bunke
机构
[1] Department of Computer Science, University of Bern, 3011 Bern
基金
欧盟地平线“2020”;
关键词
Corpus; Database; Handwriting recognition; Linguistic knowledge; Unconstrained English sentences;
D O I
10.1007/s100320200071
中图分类号
学科分类号
摘要
In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen (LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066 forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus. The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation of the text into lines and words. © 2002 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:39 / 46
页数:7
相关论文
共 25 条
[1]  
Barkensiek A., Rottland J., Kosmala A., Rigoll G., Offline handwriting recognition using various hybrid modeling techniques and character n-grams, Proc. 7th Int. Workshop on Frontiers in Handwriting Recognition, pp. 343-352, (2000)
[2]  
Favata J.T., Srihari S.N., Govindaraju V., Off-line handwritten sentence recognition, Progress in handwriting recognition, pp. 393-398, (1997)
[3]  
Francis W.N., Manual of information to accompany a standard sample of present-day edited American English for use with digital computers, (1964)
[4]  
Guyon I., Haralick R.M., Hull J.J., Phillips I.T., Database and benchmarking, Handbook of character recognition and document image analysis, pp. 779-799, (1997)
[5]  
Guyon I., Schomaker L., Plamondon R., Liberman M., Janet S., Unipen project of on-line data exchange and benchmarks, In: Proc. 12th Int. Conf. on Pattern Recognition, pp. 29-33, (1994)
[6]  
Hull J.J., A database for handwritten text recognition research, IEEE Transn Pattern Anal Mach Intell, 16, 5, pp. 550-554, (1994)
[7]  
Johansson S., Leech G.N., Goodluck H., Manual of information to accompany the Lancaster-Oslo/Bergen corpus of British English, for use with digital computers, (1978)
[8]  
Kavallieratou E., Stamatatos E., Fakotakis N., Kokkinakis G., Handwritten character segmentation using transformation-based learning, Proc. 15th Int. Conf. on Pattern Recognition, 2, pp. 634-637, (2000)
[9]  
Kim D.H., Hwang Y.S., Park S.T., Kim E.J., Paek S.H., Bang S.Y., Handwritten Korean character image database PE92, Proc. 2nd Int. Conf. on Document Analysis and Recognition, pp. 470-473, (1993)
[10]  
Kim G., Govindaraju V., Srihari S.N., Architecture for handwritten text recognition systems, Advances in handwriting recognition, pp. 163-172, (1999)