Contextual Bag-of-Words for Visual Categorization

被引:103
作者
Li, Teng [1 ]
Mei, Tao [2 ]
Kweon, In-So [1 ]
Hua, Xian-Sheng [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon 305701, South Korea
[2] Microsoft Res Asia, Beijing 100190, Peoples R China
关键词
Bag-of-words; conceptual relation; local patches context; neighboring relation; DISCOVERING OBJECTS; SCENE; REPRESENTATION; CLASSIFICATION; FEATURES; SHAPE;
D O I
10.1109/TCSVT.2010.2041828
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Bag-of-words (BOW), which represents an image by the histogram of local patches on the basis of a visual vocabulary, has attracted intensive attention in visual categorization due to its good performance and flexibility. Conventional BOW neglects the contextual relations between local patches due to its Naive Bayesian assumption. However, it is well known that contextual relations play an important role for human beings to recognize visual categories from their local appearance. This paper proposes a novel contextual bag-of-words (CBOW) representation to model two kinds of typical contextual relations between local patches, i.e., a semantic conceptual relation and a spatial neighboring relation. To model the semantic conceptual relation, visual words are grouped on multiple semantic levels according to the similarity of class distribution induced by them, accordingly local patches are encoded and images are represented. To explore the spatial neighboring relation, an automatic term extraction technique is adopted to measure the confidence that neighboring visual words are relevant. Word groups with high relevance are used and their statistics are incorporated into the BOW representation. Classification is taken using the support vector machine with an efficient kernel to incorporate the relational information. The proposed approach is extensively evaluated on two kinds of visual categorization tasks, i.e., video event and scene categorization. Experimental results demonstrate the importance of contextual relations of local patches and the CBOW shows superior performance to conventional BOW.
引用
收藏
页码:381 / 392
页数:12
相关论文
共 42 条
  • [1] [Anonymous], The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
  • [2] [Anonymous], TREC VIDEO RETRIEVAL
  • [3] [Anonymous], 2007, P INT WORKSHOP WORKS
  • [4] [Anonymous], TECH REP
  • [5] [Anonymous], 22220068 ADVENT COL
  • [6] [Anonymous], 1997, ICML
  • [7] Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
  • [8] Bekkerman R., 2001, SIGIR Forum, P146
  • [9] Berg AC, 2005, PROC CVPR IEEE, P26
  • [10] Scene parsing using region-based generative models
    Boutell, Matthew R.
    Luo, Jiebo
    Brown, Christopher M.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (01) : 136 - 146