Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors

被引:49
作者
Godden, JW
Bajorath, J
机构
[1] New Chem Entities Inc, Bothell, WA 98011 USA
[2] Univ Washington, Dept Biol Struct, Seattle, WA 98195 USA
[3] New Chem Entities, Seattle, WA 98195 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2001年 / 41卷 / 04期
关键词
D O I
10.1021/ci0102867
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A method termed Differential Shannon Entropy (DSE) is introduced to compare differences in information content and variance of molecular descriptors between compound databases. The analysis is based on histograms recording the individual and grouped distributions of molecular descriptors and calculation of Shannon entropy (SE), a formalism originally applied to digital communication. We have recently shown that SE values reflect the nonparametric variability of descriptor settings. Now the analysis has been advanced to assess differences in information content of 143 molecular descriptors in databases containing synthetic compounds, natural products, or drug-like molecules. The DSE metric captures the degree to which descriptor distributions complement or duplicate information contained in molecular databases. In our analysis, we observe significant differences for a number of descriptors and rank them according to their associated DSE values. Using DSE calculations, relative information content of different types of descriptors can be quantified, even if differences are subtle.
引用
收藏
页码:1060 / 1066
页数:7
相关论文
共 26 条
[1]   Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? [J].
Ajay ;
Walters, WP ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1998, 41 (18) :3314-3324
[2]   CHEMICAL GRAPHS .34. 5 NEW TOPOLOGICAL INDEXES FOR THE BRANCHING OF TREE-LIKE GRAPHS [J].
BALABAN, AT .
THEORETICA CHIMICA ACTA, 1979, 53 (04) :355-375
[3]   HIGHLY DISCRIMINATING DISTANCE-BASED TOPOLOGICAL INDEX [J].
BALABAN, AT .
CHEMICAL PHYSICS LETTERS, 1982, 89 (05) :399-404
[4]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[5]  
Brown RD, 1997, PERSPECT DRUG DISCOV, V7-8, P31
[6]   Computational methods in molecular diversity and combinatorial chemistry [J].
Bures, MG ;
Martin, YC .
CURRENT OPINION IN CHEMICAL BIOLOGY, 1998, 2 (03) :376-380
[7]   Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations [J].
Godden, JW ;
Stahura, FL ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (03) :796-800
[8]  
JAMES CA, 1995, DAYLIGHT FINGERPRINT
[9]   A widely applicable set of descriptors [J].
Labute, P .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2000, 18 (4-5) :464-477
[10]  
Mason J S, 2000, Pac Symp Biocomput, P576