A fractal approach for selecting an appropriate bin size for cell-based diversity estimation

被引:14
作者
Agrafiotis, DK [1 ]
Rassokhin, DN [1 ]
机构
[1] 3 Dimensional Pharmaceut Inc, Exton, PA 19341 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 01期
关键词
D O I
10.1021/ci010314l
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmic ally efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.
引用
收藏
页码:117 / 122
页数:6
相关论文
共 26 条
[1]  
Agrafiotis D. K., 1998, ENCY COMPUTATIONAL C, V1, P742
[2]   An efficient implementation of distance-based diversity measures based on k-d trees [J].
Agrafiotis, DK ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :51-58
[3]   Advances in diversity profiling and combinatorial series design [J].
Agrafiotis, DK ;
Myslik, JC ;
Salemme, FR .
MOLECULAR DIVERSITY, 1998, 4 (01) :1-22
[4]   A constant time algorithm for estimating the diversity of large chemical libraries [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (01) :159-167
[5]   Multiobjective optimization of combinatorial libraries [J].
Agrafiotis, DK .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2001, 45 (3-4) :545-566
[6]  
Agrafiotis DK, 2001, J COMPUT CHEM, V22, P488, DOI 10.1002/1096-987X(20010415)22:5<488::AID-JCC1020>3.0.CO
[7]  
2-4
[8]   Multidimensional scaling of combinatorial libraries without explicit enumeration [J].
Agrafiotis, DK ;
Lobanov, VS .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2001, 22 (14) :1712-1722
[9]  
AGRAFIOTIS DK, 1997, Patent No. 5684711
[10]  
AGRAFIOTIS DK, 2000, VIRTUAL SCREENING BI, P265