Development of a Fingerprint Reduction Approach for Bayesian Similarity Searching Based on Kullback-Leibler Divergence Analysis

被引:23
作者
Nisius, Britta [1 ]
Vogt, Martin [1 ]
Bajorath, Juergen [1 ]
机构
[1] Rhein Freidrich Wilhelms Univ Bonn, Dept Life Sci Informat, B IT, LIMES Program Unit Chem Biol & Med Chem, D-53113 Bonn, Germany
关键词
DIMENSIONAL DESCRIPTOR SPACES; ACTIVE COMPOUNDS; PERFORMANCE; MOLECULES; DATABASE; FUSION; 2D;
D O I
10.1021/ci900087y
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The contribution of individual fingerprint bit positions to similarity search performance is systematically evaluated. A method is introduced to determine bit significance on the basis of Kullback-Leibler divergence analysis of bit distributions in active and database compounds. Bit divergence analysis and Bayesian compound screening share a common methodological foundation. Hence, given the significance ranking of all individual bit positions comprising a fingerprint, subsets of bits are evaluated in the context of Bayesian screening, and minimal fingerprint representations are determined that meet or exceed the search performance of unmodified fingerprints. For fingerprints of different design evaluated on many compound activity classes, we consistently find that subsets of fingerprint bit positions are responsible for search performance. In part, these subsets are very small and contain in some cases only a few fingerprint bit positions. Structural or pharmacophore patterns captured by preferred bit positions can often be directly associated with characteristic features of active compounds. In some cases, reduced fingerprint representations clearly exceed the search performance of the original fingerprints. Thus, fingerprint reduction likely represents a promising approach for practical applications.
引用
收藏
页码:1347 / 1358
页数:12
相关论文
共 31 条
[1]  
[Anonymous], PIP PIL
[2]  
[Anonymous], 2009, MOL OP ENV MOE
[3]   Similarity searching using compound class-specific combinations of substructures found in randomly generated molecular fragment populations [J].
Batista, Jose ;
Bajorath, Juergen .
CHEMMEDCHEM, 2008, 3 (01) :67-73
[4]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[5]  
BERTHOLD M, 2007, INTELL DATA ANAL, P245
[6]  
Chen X, 2001, COMB CHEM HIGH T SCR, V4, P719
[7]  
*DAYL CHEM INF SYS, 2009, SMARTS
[8]   Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches [J].
Eckert, Hanna ;
Bojorath, Juergen .
DRUG DISCOVERY TODAY, 2007, 12 (5-6) :225-233
[9]   Anatomy of fingerprint search calculations on structurally diverse sets of active compounds [J].
Godden, JW ;
Stahura, FL ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (06) :1812-1819
[10]   New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching [J].
Hert, J ;
Willett, P ;
Wilton, DJ ;
Acklin, P ;
Azzaoui, K ;
Jacoby, E ;
Schuffenhauer, A .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :462-470