A hierarchical classification strategy for digital documents

被引:24
作者
Schettini, R
Brambilla, C
Ciocca, G
Valsasna, A
De Ponti, M
机构
[1] CNR, ITIM, I-20131 Milan, Italy
[2] CNR, IAMI, I-20131 Milan, Italy
[3] ST Microelect TPA Grp, Printer Div, I-20041 Agrate Brianza, Italy
关键词
CART methodology; compound documents; graphics; image classification; low-level features; photographs; texts;
D O I
10.1016/S0031-3203(01)00168-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The effective classification of image contents allows us to adopt strategies that can meet the increasing demand for quality, speed and ease of use in imaging applications. We report here on our experience in the use of CART classifiers for the classification of images indexed by low-level perceptual features such as color. texture, and shape. The problem addressed is the complex matter of distinguishing among photographs. graphics. texts, and compound documents. To cope with the great variety of compound documents we have designed a hierarchical classification strategy which first classifies images as compound or non-compound by verifying the homogeneity of the sub-images in terms of low-level features. Non-compound images are then classified as photographs. graphics, or texts. The results are reported and discussed. (C) 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1759 / 1769
页数:11
相关论文
共 19 条
[1]   TEXTURAL FEATURES CORRESPONDING TO TEXTURAL PROPERTIES [J].
AMADASUN, M ;
KING, R .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1989, 19 (05) :1264-1274
[2]   Distinguishing photographs and graphics on the World Wide Web [J].
Athitsos, V ;
Swain, MJ ;
Frankel, C .
IEEE WORKSHOP ON CONTENT-BASED ACCESS OF IMAGE AND VIDEO LIBRARIES, PROCEEDINGS, 1997, :10-17
[3]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[4]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[6]  
Chambers JM., 1992, Statistical models
[7]   A relevance feedback mechanism for content-based image retrieval [J].
Ciocca, G ;
Schettini, R .
INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (05) :605-632
[8]   SEGMENTATION AND CLASSIFICATION OF MIXED TEXT/GRAPHICS/IMAGE DOCUMENTS [J].
FAN, KC ;
LIU, CH ;
WANG, YK .
PATTERN RECOGNITION LETTERS, 1994, 15 (12) :1201-1209
[9]   PicToSeek: Combining color and shape invariant features for image retrieval [J].
Gevers, T ;
Smeulders, AWM .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (01) :102-119
[10]   Storage and retrieval of compressed images using wavelet vector quantization [J].
Idris, F ;
Panchanathan, S .
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 1997, 8 (03) :289-301