A constant time algorithm for estimating the diversity of large chemical libraries

被引:41
作者
Agrafiotis, DK [1 ]
机构
[1] Three Dimens Pharmaceut Inc, Exton, PA 19341 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2001年 / 41卷 / 01期
关键词
D O I
10.1021/ci000091j
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We describe a novel diversity metric for use in the design of combinatorial chemistry and high-throughput screening experiments. The method estimates the cumulative probability distribution of intermolecular dissimilarities in the collection of interest and then measures the deviation of that distribution from the respective distribution of a uniform sample using the Kolmogorov-Smirnov statistic. The distinct advantage of this approach is that the cumulative distribution can be easily estimated using probability sampling and does not require exhaustive enumeration of all pairwise distances in the data set. The function is intuitive, very fast to compute, does not depend on the size of the collection, and can be used to perform diversity estimates on both global and local scale. More importantly, it allows meaningful comparison of data sets of different cardinality and is not affected by the curse of dimensionality, which plagues many other diversity indices. The advantages of this approach are demonstrated using examples from the combinatorial chemistry literature.
引用
收藏
页码:159 / 167
页数:9
相关论文
共 27 条
  • [1] Agrafiotis D. K., 1998, ENCY COMPUTATIONAL C, V1, P742
  • [2] An efficient implementation of distance-based diversity measures based on k-d trees
    Agrafiotis, DK
    Lobanov, VS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01): : 51 - 58
  • [3] Stochastic algorithms for maximizing molecular diversity
    Agrafiotis, DK
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (05): : 841 - 851
  • [4] Agrafiotis DK, 1997, PROTEIN SCI, V6, P287
  • [5] Nonlinear mapping networks
    Agrafiotis, DK
    Lobanov, VS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06): : 1356 - 1362
  • [6] Advances in diversity profiling and combinatorial series design
    Agrafiotis, DK
    Myslik, JC
    Salemme, FR
    [J]. MOLECULAR DIVERSITY, 1998, 4 (01) : 1 - 22
  • [7] On the use of information theory for assessing molecular diversity
    Agrafiotis, DK
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03): : 576 - 580
  • [8] AGRAFIOTIS DK, 1997, Patent No. 5684711
  • [9] AGRAFIOTIS DK, 2000, VIRTUAL SCREENING BI, P265
  • [10] AGRAFIOTIS DK, IN PRESS J COMPUT CH