Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures

被引:11
作者
Bologa, C
Allu, TK
Olah, M
Kappler, MA
Oprea, TI
机构
[1] Univ New Mexico, Sch Med, Div Biocomp, Albuquerque, NM 87131 USA
[2] Daylight Chem Informat Syst Inc, Santa Fe, NM 87501 USA
[3] Sunset Mol Discovery LLC, Santa Fe, NM 87505 USA
关键词
chemical fingerprints; ChemNavigator; descriptor collision; descriptor confusion; masking chemical structures; PLS; QSAR; SMILES; WOMBAT;
D O I
10.1007/s10822-005-9020-4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We examined "descriptor collision" for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the "descriptor collision" rate (here termed "descriptor confusion"), in order to design a set of "descriptors to mask chemical structures", DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the "confusion" rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The "reduced set" of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.
引用
收藏
页码:625 / 635
页数:11
相关论文
共 27 条
[1]   NIH Molecular Libraries Initiative [J].
Austin, CP ;
Brady, LS ;
Insel, TR ;
Collins, FS .
SCIENCE, 2004, 306 (5699) :1138-1139
[2]   Topological and stereochemical molecular descriptors for databases useful in QSAR, similarity/dissimilarity and drug design [J].
Balaban, AT .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 1998, 8 (1-2) :1-21
[3]   Topological indices: Their nature and mutual relatedness [J].
Basak, SC ;
Balaban, AT ;
Grunwald, GD ;
Gute, BD .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (04) :891-898
[4]   Reoptimization of MDL keys for use in drug discovery [J].
Durant, JL ;
Leland, BA ;
Henry, DR ;
Nourse, JG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06) :1273-1280
[5]   Reverse engineering chemical structures from molecular descriptors: how many solutions? [J].
Faulon, JL ;
Brown, WM ;
Martin, S .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2005, 19 (9-10) :637-650
[6]   Why relevant chemical information cannot be exchanged without disclosing structures [J].
Filimonov, D ;
Poroikov, V .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2005, 19 (9-10) :705-713
[7]   ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY - A RAPID ACCESS TO ATOMIC CHARGES [J].
GASTEIGER, J ;
MARSILI, M .
TETRAHEDRON, 1980, 36 (22) :3219-3228
[8]   A FAST EMPIRICAL-METHOD FOR THE CALCULATION OF MOLECULAR POLARIZABILITY [J].
GLEN, RC .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1994, 8 (04) :457-466
[9]   Chemoinformatics - a new name for an old problem? [J].
Hann, M ;
Green, R .
CURRENT OPINION IN CHEMICAL BIOLOGY, 1999, 3 (04) :379-383
[10]  
KAPPLER MA, 2005, UNPUB J CHEM INF MOD, V45