An efficient implementation of distance-based diversity measures based on k-d trees

被引:44
作者
Agrafiotis, DK [1 ]
Lobanov, VS [1 ]
机构
[1] Three Dimens Pharmaceut Inc, Exton, PA 19341 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 01期
关键词
D O I
10.1021/ci980100c
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The problem of quantifying molecular diversity continues to attract significant interest among computational chemists. Most algorithms reported to date are distance-based and scale to the square of the size of the data set. This paper reports an alternative algorithm based on k-dimensional (or k-d) trees. k-d trees are combinatorial data structures that allow expedient location of nearest neighbors in multivariate spaces. Nearest neighbor detection forms the basis of many popular diversity measures, such as maximin, minimum spanning trees, and many others. In this report, we demonstrate that k-d trees exhibit excellent scaling characteristics and can be used to accelerate diversity estimation without compromising the quality of the design. The advantages of this approach are contrasted with an alternative algorithm that was recently proposed by Turner et al. based on the cosine similarity coefficient.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 23 条
[1]   Stochastic algorithms for maximizing molecular diversity [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (05) :841-851
[2]   On the use of information theory for assessing molecular diversity [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03) :576-580
[3]  
AGRAFIOTIS DK, IN PRESS ANN REV COM
[4]  
AGRAFIOTIS DK, UNPUB
[5]  
AGRAFIOTIS DK, IN PRESS ENCY COMPUT
[6]  
AGRAFIOTIS DK, 1996, 3 EL COMP CHEM C
[7]  
[Anonymous], ACM COMP SURV, DOI DOI 10.1145/356924.356930
[8]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[9]  
Friedman J. H., 1977, ACM Transactions on Mathematical Software, V3, P209, DOI 10.1145/355744.355745
[10]   Optimization and visualization of molecular diversity of combinatorial libraries [J].
Hassan, M ;
Bielawski, JP ;
Hempel, JC ;
Waldman, M .
MOLECULAR DIVERSITY, 1996, 2 (1-2) :64-74