Contextual Bag-of-Words for Visual Categorization

被引：103

作者：

Li, Teng ^{[1
]}

Mei, Tao ^{[2
]}

Kweon, In-So ^{[1
]}

Hua, Xian-Sheng ^{[2
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon 305701, South Korea

[2] Microsoft Res Asia, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2011年 / 21卷 / 04期

关键词：

Bag-of-words; conceptual relation; local patches context; neighboring relation; DISCOVERING OBJECTS; SCENE; REPRESENTATION; CLASSIFICATION; FEATURES; SHAPE;

D O I：

10.1109/TCSVT.2010.2041828

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Bag-of-words (BOW), which represents an image by the histogram of local patches on the basis of a visual vocabulary, has attracted intensive attention in visual categorization due to its good performance and flexibility. Conventional BOW neglects the contextual relations between local patches due to its Naive Bayesian assumption. However, it is well known that contextual relations play an important role for human beings to recognize visual categories from their local appearance. This paper proposes a novel contextual bag-of-words (CBOW) representation to model two kinds of typical contextual relations between local patches, i.e., a semantic conceptual relation and a spatial neighboring relation. To model the semantic conceptual relation, visual words are grouped on multiple semantic levels according to the similarity of class distribution induced by them, accordingly local patches are encoded and images are represented. To explore the spatial neighboring relation, an automatic term extraction technique is adopted to measure the confidence that neighboring visual words are relevant. Word groups with high relevance are used and their statistics are incorporated into the BOW representation. Classification is taken using the support vector machine with an efficient kernel to incorporate the relational information. The proposed approach is extensively evaluated on two kinds of visual categorization tasks, i.e., video event and scene categorization. Experimental results demonstrate the importance of contextual relations of local patches and the CBOW shows superior performance to conventional BOW.

引用

页码：381 / 392

页数：12

共 42 条

[1] [Anonymous], The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
[2] [Anonymous], TREC VIDEO RETRIEVAL
[3] [Anonymous], 2007, P INT WORKSHOP WORKS
[4] [Anonymous], TECH REP
[5] [Anonymous], 22220068 ADVENT COL
[6] [Anonymous], 1997, ICML
[7] Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[8] Bekkerman R., 2001, SIGIR Forum, P146
[9] Berg AC, 2005, PROC CVPR IEEE, P26
[10] Scene parsing using region-based generative models
Boutell, Matthew R.
Luo, Jiebo
Brown, Christopher M.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (01) : 136 - 146

← 1 2 3 4 5 →