An efficient implementation of distance-based diversity measures based on k-d trees

被引:44
作者
Agrafiotis, DK [1 ]
Lobanov, VS [1 ]
机构
[1] Three Dimens Pharmaceut Inc, Exton, PA 19341 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 01期
关键词
D O I
10.1021/ci980100c
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The problem of quantifying molecular diversity continues to attract significant interest among computational chemists. Most algorithms reported to date are distance-based and scale to the square of the size of the data set. This paper reports an alternative algorithm based on k-dimensional (or k-d) trees. k-d trees are combinatorial data structures that allow expedient location of nearest neighbors in multivariate spaces. Nearest neighbor detection forms the basis of many popular diversity measures, such as maximin, minimum spanning trees, and many others. In this report, we demonstrate that k-d trees exhibit excellent scaling characteristics and can be used to accelerate diversity estimation without compromising the quality of the design. The advantages of this approach are contrasted with an alternative algorithm that was recently proposed by Turner et al. based on the cosine similarity coefficient.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 23 条
[11]  
LAJINESS MS, 1991, QSAR RATIONAL APPROA, P201
[12]  
LIN SK, 1996, MOLECULES, V1, P57
[13]   MEASURING DIVERSITY - EXPERIMENTAL-DESIGN OF COMBINATORIAL LIBRARIES FOR DRUG DISCOVERY [J].
MARTIN, EJ ;
BLANEY, JM ;
SIANI, MA ;
SPELLMEYER, DC ;
WONG, AK ;
MOOS, WH .
JOURNAL OF MEDICINAL CHEMISTRY, 1995, 38 (09) :1431-1436
[14]  
MARTIN EJ, 1997, REV COMP CH, V10, P75
[15]  
MOUNT J, 1996, IBC 6 ANN C RAT DRUG
[16]  
OMOHUNDRO SM, 1991, ADV NEURAL INFORMATI, V3, P693
[17]  
Omohundro StephenM., 2009, Five Balltree Construction Algorithms
[18]  
Polinsky A., 1996, MOL DIVERSITY COMBIN, P219
[19]  
SNAREY M, IN PRESS J MOL GRAPH
[20]   REFINEMENTS TO NEAREST-NEIGHBOR SEARCHING IN K-DIMENSIONAL TREES [J].
SPROULL, RF .
ALGORITHMICA, 1991, 6 (04) :579-589