Scene classification using a hybrid generative/discriminative approach

被引：473

作者：

Bosch, Anna ^{[1
]}

Zisserman, Andrew ^{[2
]}

Munoz, Xavier ^{[1
]}

机构：

[1] Univ Girona, Comp Vis & Robot Grp, Girona 17071, Spain

[2] Univ Oxford, Robot Res Grp, Oxford OX1 3PJ, England

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2008年 / 30卷 / 04期

关键词：

scene classification; pLSA; spatial information;

D O I：

10.1109/TPAMI.2007.70716

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent "topics" using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.

引用

页码：712 / 727

页数：16

共 37 条

[1]

[Anonymous], P 10 IEEE COMP SOC I

[2]

[Anonymous], 2005, THESIS U CALIFORNIA

[3]

[Anonymous], 1998, LECT NOTES COMPUTER

[4]

[Anonymous], 2006, 2006 IEEE COMP SOC C, DOI DOI 10.1109/CVPR.2006.324

[5] Segmentation and description of natural outdoor scenes [J].

Bosch, A. ;

Munoz, X. ;

Freixenet, J. .

IMAGE AND VISION COMPUTING, 2007, 25 (05) :727-740

[6]

BOSCH A, 2006, P EUR C COMP VIS, P517

[7] Which is the best way to organize/classify images by content? [J].

Bosch, Anna ;

Munoz, Xavier ;

Marti, Robert .

IMAGE AND VISION COMPUTING, 2007, 25 (06) :778-791

[8] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[9] Multi-modal tracking of faces for video communications [J].

Crowley, JL ;

Berard, F .

1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :640-645

[10]

Csurka G., 2004, WORKSH STAT LEARN CO, V1, P1, DOI DOI 10.1234/12345678

← 1 2 3 4 →