Stochastic proximity embedding

被引:110
作者
Agrafiotis, DK [1 ]
机构
[1] 3 Dimens Pharmaceut Inc, Exton, PA 19341 USA
关键词
stochastic proximity embedding; multidimensional scaling; nonlinear mapping; Sammon mapping; stochastic descent; self-organizing; dimensionality reduction; feature extraction; combinatorial chemistry; data mining; data analysis; pattern recognition; molecular descriptor; molecular similarity; molecular diversity;
D O I
10.1002/jcc.10234
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We introduce stochastic proximity embedding (SPE), a novel self-organizing algorithm for producing meaningful underlying dimensions from proximity data. SPE attempts to generate low-dimensional Euclidean embeddings that best preserve the similarities between a set of related observations. The method starts with an initial configuration, and iteratively refines it by repeatedly selecting pairs of objects at random, and adjusting their coordinates so that their distances on the map match more closely their respective proximities. The magnitude of these adjustments is controlled by a learning rate parameter, which decreases during the course of the simulation to avoid oscillatory behavior. Unlike classical multidimensional scaling (MDS) and nonlinear mapping (NLM), SPE scales linearly with respect to sample size, and can be applied to very large data sets that are intractable by conventional embedding procedures. The method is programmatically simple, robust, and convergent, and can be applied to a wide range of scientific problems involving exploratory data analysis and visualization. (C) 2003 Wiley Periodicals, Inc.
引用
收藏
页码:1215 / 1221
页数:7
相关论文
共 20 条
  • [1] Nonlinear mapping networks
    Agrafiotis, DK
    Lobanov, VS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06): : 1356 - 1362
  • [2] Combinatorial informatics in the post-genomics era
    Agrafiotis, DK
    Lobanov, VS
    Salemme, FR
    [J]. NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (05) : 337 - 346
  • [3] Agrafiotis DK, 2001, J COMPUT CHEM, V22, P488, DOI 10.1002/1096-987X(20010415)22:5<488::AID-JCC1020>3.0.CO
  • [4] 2-4
  • [5] Multidimensional scaling of combinatorial libraries without explicit enumeration
    Agrafiotis, DK
    Lobanov, VS
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2001, 22 (14) : 1712 - 1722
  • [6] AGRAFIOTIS DK, 1995, 3 DIMENSIONAL PHARM
  • [7] EVALUATION OF PROJECTION ALGORITHMS
    BISWAS, G
    JAIN, AK
    DUBES, RC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1981, 3 (06) : 701 - 708
  • [8] Borg I., 1997, MODERN MULTIDIMENSIO
  • [9] HEURISTIC RELAXATION METHOD FOR NONLINEAR MAPPING IN CLUSTER ANALYSIS
    CHANG, CL
    LEE, RCT
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (02): : 197 - 200
  • [10] Crippen G. M., 1988, DISTANCE GEOMETRY MO