Dimensionality reduction for documents with nearest neighbor queries

被引:30
作者
Ingram, Stephen [1 ]
Munzner, Tamara [1 ]
机构
[1] Univ British Columbia, Vancouver, BC V5Z 1M9, Canada
关键词
Dimensionality reduction; Document visualization; PRINCIPAL COMPONENT ANALYSIS; ALGORITHM;
D O I
10.1016/j.neucom.2014.07.073
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Document collections are often stored as sets of sparse, high-dimensional feature vectors. Performing dimensionality reduction (DR) on such high-dimensional datasets for the purposes of visualization presents algorithmic and qualitative challenges for existing DR techniques. We propose the Q-SNE algorithm for dimensionality reduction of document data, combining the scalable probability-based layout approach of BH-SNE with an improved component to calculate approximate nearest neighbors, using the query-based APQ approach that exploits an impact-ordered inverted file. We provide thorough experimental evidence that Q-SNE yields substantial quality improvements for layouts of large document collections with commensurate speed. Our experiments were conducted with six real-world benchmark datasets that range up to millions of documents and terms, and compare against three alternatives for nearest neighbor search and five alternatives for dimensionality reduction. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:557 / 569
页数:13
相关论文
共 54 条
[1]
[Anonymous], THESIS U BRIT COLUMB
[2]
[Anonymous], 2013, PROC 30 INT C MACHIN
[3]
[Anonymous], 2002, Series: Springer Series in Statistics
[4]
[Anonymous], TREC
[5]
[Anonymous], P INT C LEARN REPR I
[6]
[Anonymous], IEEE C DAT ENG ICDE
[7]
An optimal algorithm for approximate nearest neighbor searching in fixed dimensions [J].
Arya, S ;
Mount, DM ;
Netanyahu, NS ;
Silverman, R ;
Wu, AY .
JOURNAL OF THE ACM, 1998, 45 (06) :891-923
[8]
Bache K, 2013, UCI machine learning repository
[9]
Barrett R., 1994, TEMPLATES SOLUTION L
[10]
Bayardo R.J., 2007, Proceedings of the 16th international conference on World Wide Web, P131, DOI DOI 10.1145/1242572.1242591