Semantic hashing

被引:764
作者
Salakhutdinov, Ruslan [1 ]
Hinton, Geoffrey [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会;
关键词
Information retrieval; Graphical models; Unsupervised learning;
D O I
10.1016/j.ijar.2008.11.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs "semantic hashing": Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is Much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set. (C) 2008 Elsevier Inc. All rights reserved.
引用
收藏
页码:969 / 978
页数:10
相关论文
共 23 条
[1]  
Andoni A, 2006, ANN IEEE SYMP FOUND, P459
[2]  
[Anonymous], 2002, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2008, P IEEE C COMP VIS PA
[4]  
Bengio Y., 2007, Scaling learning algo
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]   Learning a similarity metric discriminatively, with application to face verification [J].
Chopra, S ;
Hadsell, R ;
LeCun, Y .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :539-546
[7]  
DATAR I, 2004, COMPGEOM
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]  
Friedman J. H, 1975, ALGORITHM FINDING BE