Why table ground-truthing is hard

被引:17
作者
Hu, JY [1 ]
Kashi, R [1 ]
Lopresti, D [1 ]
Nagy, G [1 ]
Wilfong, G [1 ]
机构
[1] Avaya Inc, Avaya Labs, Murray Hill, NJ 07974 USA
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDAR.2001.953768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The principle that for every document analysis task there exists a mechanism for creating well-de, fined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem map, in fact, be much snore difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable "truth" and/or incomplete or partial "truths.".
引用
收藏
页码:129 / 133
页数:3
相关论文
共 6 条
[1]  
[Anonymous], DOCUMENT ANAL SYSTEM
[2]   ANATOMY OF A VERSATILE PAGE READER [J].
BAIRD, HS .
PROCEEDINGS OF THE IEEE, 1992, 80 (07) :1059-1065
[3]  
Hu JY, 2001, P SOC PHOTO-OPT INS, V4307, P44
[4]  
Lopresti D, 2001, LECT NOTES COMPUT SC, V1941, P93
[5]  
Phillips I. T., 1993, Proceedings of the Second International Conference on Document Analysis and Recognition (Cat. No.93TH0578-5), P478, DOI 10.1109/ICDAR.1993.395691
[6]  
WANG X, 1996, THESIS U WATERLOO