Evaluating the performance of table processing algorithms

被引:49
作者
Hu J. [1 ]
Kashi R.S. [1 ]
Lopresti D. [2 ]
Wilfong G.T. [2 ]
机构
[1] Avaya Labs Research, Basking Ridge, NJ 07920
[2] Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974
关键词
Document layout analysis; Document understanding; Edit distance; Graph matching; Performance evaluation; Table detection; Table recognition;
D O I
10.1007/s100320200074
中图分类号
学科分类号
摘要
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, "graph probing," for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. © 2002 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:140 / 153
页数:13
相关论文
共 21 条
[1]  
Agne S., Rogger M., Rohrschneider J., Benchmarking of document page segmentation, Proc. Document Recognition andRetriev al VII, 3967, pp. 165-171, (2000)
[2]  
Garey M.R., Johnson D.S., Computers andin tractability: A guide to the theory of np-completeness, (1979)
[3]  
Hu J., Kashi R., Lopresti D., Wilfong G., Mediumindependent table detection, Proc. Document Recognition andRetriev al VII, 3967, pp. 291-302, (2000)
[4]  
Hu J., Kashi R., Lopresti D., Wilfong G., A system for understanding and reformulating tables, Proc. 4th IAPR International Workshop on Document Analysis Systems, pp. 361-372, (2000)
[5]  
Hu J., Kashi R., Lopresti D., Wilfong G., Table structure recognition andits evaluation, Proc. Document Recognition and Retrieval VIII, 4307, pp. 44-55, (2001)
[6]  
Ishitani Y., Model matching based on association graph for form image understanding, Proc. 3rd International Conference on Document Analysis andRecognition, pp. 287-292, (1995)
[7]  
Jain A., Dubes R.C., Algorithms for clustering data, (1988)
[8]  
Kanai J., Automatedp erformance evaluation of document image analysis systems: Issues andpractice, Intl J Imaging Sci Technol, 7, pp. 363-369, (1996)
[9]  
Lopresti D., Nagy G., Automatedtable processing: An (opinionated) survey, Proc. 3rd IAPR International Workshop on Graphics Recognition, pp. 109-134, (1999)
[10]  
Lopresti D., Nagy G., Issues in ground-truthing graphic documents, Proc. 4th IAPR International Workshop on Graphics Recognition, pp. 59-72, (2001)