Molecular diversity and representativity in chemical databases

被引：85

作者：

Bayada, DM ^{[1
]}

Hamersma, H ^{[1
]}

van Geerestein, VJ ^{[1
]}

机构：

[1] NV Organon, Dept Mol Design & Informat, NL-5340 BH Oss, Netherlands

来源：

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 01期

关键词：

D O I：

10.1021/ci980109e

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

It is now common practice in the pharmaceutical industry to use molecular diversity selection methods. With the advent of high throughput screening and combinatorial chemistry, compounds must be rationally selected from databases of hundreds of thousands of compounds to be tested for several biological activities. We explore the differences between diversity and representativity. Validation runs were made for different diversity selection methods (such as the MaxMin function), several representativity techniques (selection of compounds closest to centroids of clusters, Kohonen neural networks, nonlinear scaling of descriptor values), and various types of descriptors (topological and 3D fingerprints) including some validated whole-molecule numerical descriptors that were chosen for their correlation with biological activities. We find that only clustering based on fingerprints or on whole-molecule descriptors gives results consistently superior to random selection in extracting a diverse set of activities from a file with potential drug molecules. The results further indicate that clustering selection from fingerprints is biased toward small molecules, a behavior that might partly explain its success over other types of methods. Using numerical descriptors instead of fingerprints removes this bias without penalising performance too much.

引用

页码：1 / 10

页数：10

共 31 条

[1] PLOTS OF HIGH-DIMENSIONAL DATA
ANDREWS, DF
[J]. BIOMETRICS, 1972, 28 (01) : 125 - &
[2] CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES
BARNARD, JM
DOWNS, GM
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06): : 644 - 649
[3] Locating biologically active compounds in medium-sized heterogeneous datasets by topological autocorrelation vectors: Dopamine and benzodiazepine agonists
Bauknecht, H
Zell, A
Bayer, H
Levi, P
Wagener, M
Sadowski, J
Gasteiger, J
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (06): : 1205 - 1213
[4] BROTO P, 1984, EUR J MED CHEM, V19, P66
[5] BROTO P, 1984, EUR J MED CHEM, V19, P79
[6] Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection
Brown, RD
Martin, YC
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03): : 572 - 584
[7] Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds
Cummins, DJ
Andrews, CW
Bentley, JA
Cory, M
[J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (04): : 750 - 763
[8] DELANEY JS, 1995, MOL DIVERS, V1, P217
[9] A NONLINEAR MAP OF SUBSTITUENT CONSTANTS FOR SELECTING TEST SERIES AND DERIVING STRUCTURE-ACTIVITY-RELATIONSHIPS .1. AROMATIC SERIES
DOMINE, D
DEVILLERS, J
CHASTRETTE, M
[J]. JOURNAL OF MEDICINAL CHEMISTRY, 1994, 37 (07) : 973 - 980
[10] ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY - A RAPID ACCESS TO ATOMIC CHARGES
GASTEIGER, J
MARSILI, M
[J]. TETRAHEDRON, 1980, 36 (22) : 3219 - 3228

← 1 2 3 4 →