Chemical database mining through entropy-based molecular similarity assessment of randomly generated structural fragment populations

被引:22
作者
Batista, Jose [1 ]
Bajorath, Juergen [1 ]
机构
[1] Univ Bonn, Dept Life Sci Informat, BIT, D-53113 Bonn, Germany
关键词
D O I
10.1021/ci600377m
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We describe a novel approach to search for active compounds that is based on the generation of random molecular fragment populations. As a similarity-based methodology, fragment profiling does not depend on the use of predefined descriptors of molecular structure and properties and the design of chemical space representations. To adapt the generation and comparison of random fragment populations for large-scale compound screening, we compare different fragmentation schemes, introduce the concept of compound class-specific fragment frequencies, and develop a novel entropic similarity metric for compound ranking. The approach has been extensively tested on 15 different compound activity classes with varying degrees of intraclass structural diversity and produced promising results in these calculations, comparable to similarity searching using fingerprints. A key feature of fragment profile searching is that the calculation of compound class-specific proportional Shannon entropy of random fragment distributions enables the identification of database molecules that share a significant number of signature substructures with known active compounds.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 31 条
[1]   Mass spectrometry in proteomics [J].
Aebersold, R ;
Goodlett, DR .
CHEMICAL REVIEWS, 2001, 101 (02) :269-295
[2]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[3]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[4]   Assessment of molecular similarity from the analysis of randomly generated structural fragment populations [J].
Batista, Jose ;
Godden, Jeffrey W. ;
Bajorath, Juergen .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (05) :1937-1944
[5]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[6]   Discussion of measures of enrichment in virtual screening: Comparing the information content of descriptors with increasing levels of sophistication [J].
Bender, A ;
Glen, RC .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (05) :1369-1375
[7]  
Chemical Computing Group Inc, 2005, MOE MOL OP ENV
[8]   Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations [J].
Godden, JW ;
Stahura, FL ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (03) :796-800
[9]   Chemical descriptors with distinct levels of information content and varying sensitivity to differences between selected compound databases identified by SE-DSE analysis [J].
Godden, JW ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (01) :87-93
[10]   Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors [J].
Godden, JW ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (04) :1060-1066