How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space

被引:229
作者
Bender, Andreas [1 ]
Jenkins, Jeremy L. [1 ]
Scheiber, Josef [1 ]
Sukuru, Sai Chelan K. [1 ]
Glick, Meir [1 ]
Davies, John W. [1 ]
机构
[1] BioMed Res Inc, Ctr Prote Chem, Novartis Inst, Lead Discovery Informat, Cambridge, MA 02139 USA
关键词
BIOACTIVE REFERENCE STRUCTURES; DATA FUSION; CHEMICAL SIMILARITY; BIOLOGICAL-ACTIVITY; CHEMOGENOMICS; FINGERPRINTS; VALIDATION; DESIGN; 2D; PERFORMANCE;
D O I
10.1021/ci800249s
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Different molecular descriptors capture different aspects of molecular structures, but this effect has not yet been quantified systematically on a large scale. In this work, we calculate the similarity of 37 descriptors by repeatedly selecting query compounds and ranking the rest of the database. Euclidean distances between the rank-ordering of different descriptors are calculated to determine descriptor (as opposed to compound) similarity, followed by PCA for visualization. Four broad descriptor classes are identified, which are circular fingerprints; circular fingerprints considering counts; path-based and keyed fingerprints; and pharmacophoric descriptors. Descriptor behavior is much more defined by those four classes than the particular parametrization. Using counts instead of the presence/absence of fingerprints significantly changes descriptor behavior, which is crucial for performance of topological autocorrelation vectors, but not circular fingerprints. Four-point pharmacophores (piDAPH4) surprisingly lead to much higher retrieval rates than three-point pharmacophores (28.21% vs 19.15%) but still similar rank-ordering of compounds (retrieval of similar actives). Looking into individual rankings, circular fingerprints seem more appropriate than path-based fingerprints if complex ring systems or branching patterns are present; count-based fingerprints could be more suitable in databases with a large number of repeated subunits (amide bonds, sugar rings, terpenes). Information-based selection of diverse fingerprints for consensus scoring (ECFP4/TGD fingerprints) led only to marginal improvement over single fingerprint results. While it seems to be nontrivial to exploit orthogonal descriptor behavior to improve retrieval rates in consensus virtual screening, those descriptors still each retrieve different actives which corroborates the strategy of employing diverse descriptors individually in prospective virtual screening settings.
引用
收藏
页码:108 / 119
页数:12
相关论文
共 48 条
  • [1] *ACC, 2007, PIPELINEPILOT VERS 6
  • [2] [Anonymous], 1990, M 196 1988 LOS ANG C
  • [3] The use of consensus scoring in ligand-based virtual screening
    Baber, JC
    William, AS
    Gao, YH
    Feher, M
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) : 277 - 288
  • [4] Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening
    Bajorath, J
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02): : 233 - 245
  • [5] BASAK SC, 1997, SAR QSAR ENVIRON RES, V7, P1
  • [6] Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance
    Bender, A
    Mussa, HY
    Glen, RC
    Reiling, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05): : 1708 - 1718
  • [7] Molecular similarity: a key technique in molecular informatics
    Bender, A
    Glen, RC
    [J]. ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) : 3204 - 3218
  • [8] Discussion of measures of enrichment in virtual screening: Comparing the information content of descriptors with increasing levels of sophistication
    Bender, A
    Glen, RC
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (05) : 1369 - 1375
  • [9] Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier
    Bender, A
    Mussa, HY
    Glen, RC
    Reiling, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01): : 170 - 178
  • [10] Bender A, 2008, CURR OPIN DRUG DISC, V11, P327