Visualizing Large-scale and High-dimensional Data

被引:280
作者
Tang, Jian [1 ]
Liu, Jingzhou [1 ,2 ]
Zhang, Ming [2 ]
Mei, Qiaozhu [3 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
[3] Univ Michigan, Ann Arbor, MI 48109 USA
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16) | 2016年
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Visualization; big data; high-dimensional data; TREES;
D O I
10.1145/2872427.2883041
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to largescale and high-dimensional data (e.g., millions of data points and hundreds of dimensions). We propose the LargeVis, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to t-SNE, LargeVis significantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure thus easily scales to millions of high-dimensional data points. Experimental results on real-world data sets demonstrate that the LargeVis outperforms the state-of-the-art methods in both efficiency and effectiveness. The hyper-parameters of LargeVis are also much more stable over different data sets.
引用
收藏
页码:287 / 297
页数:11
相关论文
共 28 条
[1]
[Anonymous], 2004, P 20 ACM S COMP
[2]
[Anonymous], 2002, Principal components analysis
[3]
[Anonymous], IS T SPIE ELECT IMAG
[4]
[Anonymous], 2011, Advances in Neural Information Processing Systems
[5]
[Anonymous], 2013, Handbook of Graph Drawing and Visualization
[6]
[Anonymous], 1952, Psychometrika
[7]
[Anonymous], 2002, ADV NEURAL INFORM PR
[8]
Bastian M., 2009, P INT AAAI C WEB SOC, V3, DOI [DOI 10.1609/ICWSM.V3I1.13937, 10.1609/icwsm.v3i1.13937]
[9]
Belkin M, 2002, ADV NEUR IN, V14, P585
[10]
MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517