Large-Scale Discovery of Spatially Related Images

被引:81
作者
Chum, Ondrej [1 ]
Matas, Jiri [1 ]
机构
[1] Czech Tech Univ, Fac Elect Engn, Prague 12135, Czech Republic
关键词
minHash; image clustering; image retrieval; bag of words;
D O I
10.1109/TPAMI.2009.166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a randomized data mining method that finds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of pairs of images with spatial overlap, the so-called cluster seeds. The seeds are then used as visual queries to obtain clusters which are formed as transitive closures of sets of partially overlapping images that include the seed. We show that the probability of finding a seed for an image cluster rapidly increases with the size of the cluster. The properties and performance of the algorithm are demonstrated on data sets with 10(4), 10(5), and 5 x 10(6) images. The speed of the method depends on the size of the database and the number of clusters. The first stage of seed generation is close to linear for databases sizes up to approximately 2(34) approximate to 10(10) images. On a single 2.4 GHz PC, the clustering process took only 24 minutes for a standard database of more than 100,000 images, i.e., only 0.014 seconds per image.
引用
收藏
页码:371 / 377
页数:7
相关论文
共 31 条
[1]  
[Anonymous], P S THEOR COMP
[2]  
[Anonymous], 2006, P IEEE COMPUTER SOC
[3]  
[Anonymous], 2007, CVPR
[4]  
[Anonymous], 2003, P IEEE INT C COMP VI
[5]  
[Anonymous], 2007, P IEEE C COMP VIS PA
[6]  
[Anonymous], P IEEE C COMP VIS PA
[7]  
[Anonymous], P IEEE INT C COMP VI
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
BRODER AZ, 1998, P SEQS SEQ 91
[10]  
Chum O., 2007, P INT C IM VID RETR