Table structure understanding and its performance evaluation

被引:59
作者
Wang, YL
Phillips, IT
Haralick, RM
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
[2] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
[3] CUNY, Grad Sch, New York, NY 10016 USA
关键词
pattern recognition; document image analysis; document layout analysis; table structure understanding; performance evaluation; non-parametric statistical modeling; optimization;
D O I
10.1016/j.patcog.2004.01.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:1479 / 1497
页数:19
相关论文
共 32 条
[1]   INFORMys: A flexible invoice-like form-reader system [J].
Cesarini, F ;
Gori, M ;
Marinai, S ;
Soda, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (07) :730-745
[2]  
Chandran S., 1993, Proceedings of the Second International Conference on Document Analysis and Recognition (Cat. No.93TH0578-5), P516, DOI 10.1109/ICDAR.1993.395683
[3]  
GOSSENS M, 1994, LATEX COMPANION
[4]  
Green E., 1995, Proceedings of the Third International Conference on Document Analysis and Recognition, P214, DOI 10.1109/ICDAR.1995.598979
[5]  
HA J, 1993, P 2 INT C DOC AN REC, P952
[6]  
HANDLEY JC, 2001, SPIE DOCUMENT RECOGN, V8
[7]  
HARALICK R, 1997, COMPUTER ROBOT VISIO, V1
[8]   Evaluating the performance of table processing algorithms [J].
Hu J. ;
Kashi R.S. ;
Lopresti D. ;
Wilfong G.T. .
International Journal on Document Analysis and Recognition, 2002, 4 (03) :140-153
[9]  
HU J, 2001, SPIE DOCUMENT RECOGN, V8
[10]   Why table ground-truthing is hard [J].
Hu, JY ;
Kashi, R ;
Lopresti, D ;
Nagy, G ;
Wilfong, G .
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, :129-133