Analysis and display of the size dependence of chemical similarity coefficients

被引:122
作者
Holliday, JD
Salim, N
Whittle, M
Willett, P
机构
[1] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S10 2TN, S Yorkshire, England
[2] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2003年 / 43卷 / 03期
关键词
D O I
10.1021/ci034001x
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We discuss the size-bias inherent in several chemical similarity coefficients when used for the similarity searching or diversity selection of compound collections. Limits to the upper bounds of 14 standard similarity coefficients are investigated, and the results are used to identify some exceptional characteristics of a few of the coefficients. An additional numerical contribution to the known size bias in the Tanimoto coefficient is identified. Graphical plots with respect to relative bit density are introduced to further assess the coefficients. Our methods reveal the asymmetries inherent in most similarity coefficients that lead to bias in selection, most notably with the Forbes and Russell-Rao coefficients. Conversely, when applied to the recently introduced Modified Tanimoto coefficient our methods provide support for the view that it is less biased toward molecular size than most. In this work we focus our discussion on fragment-based bit strings, but we demonstrate how our approach can be generalized to continuous representations.
引用
收藏
页码:819 / 828
页数:10
相关论文
共 13 条
[1]   The hidden component of size in two-dimensional fragment descriptors: Side effects on sampling in bioactive libraries [J].
Dixon, SL ;
Koehler, RT .
JOURNAL OF MEDICINAL CHEMISTRY, 1999, 42 (15) :2887-2900
[2]   A modification of the Jaccard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings [J].
Fligner, MA ;
Verducci, JS ;
Blower, PE .
TECHNOMETRICS, 2002, 44 (02) :110-119
[3]   On the properties of bit string-based measures of chemical similarity [J].
Flower, DR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03) :379-386
[4]   Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients [J].
Godden, JW ;
Xue, L ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (01) :163-166
[5]   Molecular complexity and its impact on the probability of finding leads for drug discovery [J].
Hann, MM ;
Leach, AR ;
Harper, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03) :856-864
[6]  
Holliday JD, 2002, COMB CHEM HIGH T SCR, V5, P155
[7]  
Lajiness MS, 1997, PERSPECT DRUG DISCOV, V7-8, P65
[8]   Current trends in lead discovery: Are we looking for the appropriate properties? [J].
Oprea, TI .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) :325-334
[9]  
Press W., 1994, NUMERICAL RECIPIES C
[10]  
Salim N., 2002, THESIS U SHEFFIELD