Content-based image retrieval with compact deep convolutional features

被引:98
作者
Alzu'bi, Ahmad [1 ]
Amira, Abbes [2 ]
Ramzan, Naeem [1 ]
机构
[1] Univ West Scotland, Sch Engn & Comp, Paisley PA1 2BE, Renfrew, Scotland
[2] Qatar Univ, Coll Engn, POB 2713, Doha, Qatar
关键词
CBIR; Deep leaming; Convolutional neural networks; Bilinear compact pooling; Similarity matching;
D O I
10.1016/j.neucom.2017.03.072
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN architecture with orderless quantization approaches, which limits the utilization of intermediate convolutional layers for identifying image local patterns. As one of the first works in the context of content-based image retrieval (CBIR), this paper proposes a new bilinear CNN-based architecture using two parallel CNNs as feature extractors. The activations of convolutional layers are directly used to extract the image features at various image locations and scales. The network architecture is initialized by deep CNNs sufficiently pre-trained on a large generic image dataset then fine-tuned for the CBIR task. Additionally, an efficient bilinear root pooling is proposed and applied to the low-dimensional pooling layer to reduce the dimension of image features to compact but high discriminative image descriptors. Finally, an end-to-end training with backpropagation is performed to fine-tune the final architecture and to learn its parameters for the image retrieval task. The experimental results achieved on three standard benchmarking image datasets demonstrate the outstanding performance of the proposed architecture at extracting and learning complex features for the CBIR task without prior knowledge about the semantic meta-data of images. For instance, using a very compact image vector of 16-length, we achieve a retrieval accuracy of 95.7% (mAP) on Oxford 5K and 88.6% on Oxford 105K; which outperforms the best results reported by state-of-the-art approaches. Additionally, a noticeable reduction is attained in the required extraction time for image features and the memory size required for storage. 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:95 / 105
页数:11
相关论文
共 33 条
[1]
[Anonymous], 2015, IEEE T COMPON PACKAG
[2]
[Anonymous], 2006, 2006 IEEE COMP SOC C
[3]
[Anonymous], 2012, P 15 INT C ARTIFICIA
[4]
[Anonymous], P 10 EUR C COMP VIS
[5]
Arandjelovic R, 2018, IEEE T PATTERN ANAL, V40, P1437, DOI [10.1109/TPAMI.2017.2711011, 10.1109/CVPR.2016.572]
[6]
Neural Codes for Image Retrieval [J].
Babenko, Artem ;
Slesarev, Anton ;
Chigorin, Alexandr ;
Lempitsky, Victor .
COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 :584-599
[7]
Speeded-Up Robust Features (SURF) [J].
Bay, Herbert ;
Ess, Andreas ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359
[8]
The devil is in the details: an evaluation of recent feature encoding methods [J].
Chatfield, Ken ;
Lempitsky, Victor ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[9]
Total recall: Automatic query expansion with a generative feature model for object retrieval [J].
Chum, Ondrej ;
Philbin, James ;
Sivic, Josef ;
Isard, Michael ;
Zisserman, Andrew .
2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, :496-+
[10]
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848