Word spotting for historical documents

被引:232
作者
Rath, Tony M. [1 ]
Manmatha, R. [1 ]
机构
[1] Univ Massachusetts, Multimedia Indexing & Retrieval Grp, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA
关键词
D O I
10.1007/s10032-006-0027-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Searching and indexing historical handwritten collections are a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating "interesting" clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering using both K-means and agglomerative clustering techniques. It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.
引用
收藏
页码:139 / 152
页数:14
相关论文
共 42 条
[1]
[Anonymous], 1949, Human behaviour and the principle of least-effort
[2]
Baeza-Yates R.A., 1999, Modern Information Retrieval
[3]
Shape matching and object recognition using shape contexts [J].
Belongie, S ;
Malik, J ;
Puzicha, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) :509-522
[4]
CHEN FR, 1995, P SOC PHOTO-OPT INS, V2422, P256, DOI 10.1117/12.205828
[5]
Friedman J, 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[6]
Fast handwriting recognition for indexing historical documents [J].
Govindaraju, V ;
Xue, HH .
FIRST INTERNATIONAL WORKSHOP ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2004, :314-320
[7]
Heaps H. S., 1978, Information Retrieval: Computational and Theoretical Aspects
[8]
Hull, 1993, 2 ANN S DOC AN INF R, P217
[9]
MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION [J].
ITAKURA, F .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :67-72
[10]
JONES GJF, 1995, INT CONF ACOUST SPEE, P309, DOI 10.1109/ICASSP.1995.479535