Fine-grained document genre classification using first order random graphs

被引:27
作者
Bagdanov, AD [1 ]
Worring, M [1 ]
机构
[1] Univ Amsterdam, Intelligent Sensory Informat Syst, NL-1098 SJ Amsterdam, Netherlands
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDAR.2001.953759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We approach the general problem of classifying machine-printed documents into genres. Layout is a critical factor in recognizing fine-grained genres, as document content features are similar. Document genre is determined from the layout structure detected front scanned binary, images of the document pages, using no OCR results and minimal a priori knowledge of document logical structures. Our method uses attributed relational graphs (ARGs) to represent the layout structure of document instances, and a first order random graphs (FORGs) to represent document genres. In this paper we develop our FORG-based genre classification method and present a comparative evaluation between our technique and a variety of statistical pattern classifiers. FORGs are capable of modeling common layout structure within a document genre and are shown to significantly outperform traditional pattern classification techniques when fine-grained genre distinctions must be drawn.
引用
收藏
页码:79 / 83
页数:3
相关论文
共 12 条
[1]  
Alquezar R., 1998, Advances in Pattern Recognition. Joint IAPR International Workshops SSPR'98 and SPR'98. Proceedings, P112, DOI 10.1007/BFb0033229
[2]  
BAIRD HS, 1999, P S DOC IM UND TECHN
[3]  
Bunke H, 2000, INT C PATT RECOG, P117, DOI 10.1109/ICPR.2000.906030
[4]  
CESARINI F, 1999, P INT C DOC AN REC
[5]  
DENGEL A, 1994, P 1 INT WORKSH DOC A, P253
[6]  
DOERMANN D, 1997, P ICDAR97
[7]  
HU GWJ, 2000, INFORMATION RETRIEVA, V2, P227
[8]  
MURTHY SK, 1994, J ARTIFICIAL INTELLI, V2
[9]   POSTAL ADDRESS BLOCK LOCATION IN REAL-TIME [J].
PALUMBO, PW ;
SRIHARI, SN ;
SOH, J ;
SRIDHAR, R ;
DEMJANENKO, V .
COMPUTER, 1992, 25 (07) :34-42
[10]  
SHIN CK, 2000, P SPIE DOC REC RETR, V7