Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets

被引:457
作者
Belkina, Anna C. [1 ,2 ]
Ciccolella, Christopher O. [3 ]
Anno, Rina [4 ]
Halpert, Richard [5 ]
Spidlen, Josef [5 ]
Snyder-Cappione, Jennifer E. [2 ,6 ]
机构
[1] Boston Univ, Sch Med, Dept Pathol & Lab Med, Boston, MA 02118 USA
[2] Boston Univ, Sch Med, Flow Cytometry Core Facil, Boston, MA 02118 USA
[3] Omiq Inc, Santa Clara, CA 95050 USA
[4] Kansas State Univ, Dept Math, Manhattan, KS 66506 USA
[5] BD Life Sci FlowJo, Ashland, OR 97520 USA
[6] Boston Univ, Sch Med, Dept Microbiol, Boston, MA 02118 USA
关键词
MASS CYTOMETRY; FLOW-CYTOMETRY; CELLS; IMMUNE; HETEROGENEITY; FLUORESCENCE; PANEL;
D O I
10.1038/s41467-019-13055-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We develop opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.
引用
收藏
页数:12
相关论文
共 47 条
[31]
CD28 Negative T Cells: Is Their Loss Our Gain? [J].
Mou, D. ;
Espinosa, J. ;
Lo, D. J. ;
Kirk, A. D. .
AMERICAN JOURNAL OF TRANSPLANTATION, 2014, 14 (11) :2460-2466
[32]
OMIP-050: A 28-color/30-parameter Fluorescence Flow Cytometry Panel to Enumerate and Characterize Cells Expressing a Wide Array of Immune Checkpoint Molecules [J].
Nettey, Leonard ;
Giles, Amber J. ;
Chattopadhyay, Pratip K. .
CYTOMETRY PART A, 2018, 93A (11) :1094-1096
[33]
Hierarchical Stochastic Neighbor Embedding [J].
Pezzotti, N. ;
Hollt, T. ;
Lelieveldt, B. ;
Eisemann, E. ;
Vilanova, A. .
COMPUTER GRAPHICS FORUM, 2016, 35 (03) :21-30
[34]
Approximated and User Steerable tSNE for Progressive Visual Analytics [J].
Pezzotti, Nicola ;
Lelieveldt, Boudewijn P. F. ;
van der Maaten, Laurens ;
Hollt, Thomas ;
Eisemann, Elmar ;
Vilanova, Anna .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (07) :1739-1752
[35]
Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE) [J].
Shekhar, Karthik ;
Brodin, Petter ;
Davis, Mark M. ;
Chakraborty, Arup K. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (01) :202-207
[36]
OMIP-042: 21-color flow cytometry to comprehensively immunophenotype major lymphocyte and myeloid subsets in human peripheral blood [J].
Staser, Karl W. ;
Eades, William ;
Choi, Jaebok ;
Karpova, Darja ;
DiPersio, John F. .
CYTOMETRY PART A, 2018, 93A (02) :186-189
[37]
Stoeckius M, 2017, NAT METHODS, V14, P865, DOI [10.1038/NMETH.4380, 10.1038/nmeth.4380]
[38]
Visualizing Large-scale and High-dimensional Data [J].
Tang, Jian ;
Liu, Jingzhou ;
Zhang, Ming ;
Mei, Qiaozhu .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, :287-297
[39]
van der Maaten L, 2014, J MACH LEARN RES, V15, P3221
[40]
van der Maaten L, 2008, J MACH LEARN RES, V9, P2579