Similarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance

被引:20
作者
Willett, Peter [1 ]
机构
[1] Univ Sheffield, Informat Sch, Sheffield, S Yorkshire, England
关键词
COMBINATORIAL LIBRARIES; CLUSTERING METHODS; COMPOUND SELECTION; GROUP FUSION; DIVERSITY; DISSIMILARITY; PROPERTY; OPTIMIZATION; DESCRIPTORS; ALGORITHMS;
D O I
10.1002/widm.26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper reviews the use of measures of intermolecular similarity for processing databases of chemical structures, which play an important role in the discovery of new drugs by the pharmaceutical industry. The similarity measures considered here are based on the use of a fingerprint representation of molecular structure, where a fingerprint is a vector encoding the presence of fragment substructures in a molecule and where the similarity between pairs of such fingerprints is computed using an association coefficient such as the Tanimoto coefficient. The Similar Property Principle provides the basic rationale for the use of similarity methods in three important chemoinformatics applications-similarity searching, database clustering, and molecular diversity analysis. Similarity searching enables the identification of those molecules in a database that are most similar to a user-defined, biologically active query molecule, with data fusion providing an effective way of combining the results of multiple similarity searches. Cluster analysis, typically using the Jarvis-Patrick, Ward, or divisive k-means clustering methods, enables the cost-effective selection of molecules for biological testing, for property prediction and for investigating database overlap. Molecular diversity analysis, typically using cluster-based, dissimilarity-based, or optimization-based approaches, enables the identification of structurally diverse sets of molecules, so as to ensure that the full chemical space spanned by a database is tested in the search for novel bioactive molecules. (C) 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 241-251 DOI: 10.1002/widm.26
引用
收藏
页码:241 / 251
页数:11
相关论文
共 107 条
[1]   Similarity-Based Virtual Screening with a Bayesian Inference Network [J].
Abdo, Ammar ;
Salim, Naomie .
CHEMMEDCHEM, 2009, 4 (02) :210-218
[2]   Multiobjective optimization of combinatorial libraries [J].
Agrafiotis, DK .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) :335-356
[3]  
Alvarez J., 2005, VIRTUAL SCREENING DR
[4]  
[Anonymous], 2007, An introduction to chemoinformatics
[5]  
[Anonymous], 1963, PRINCIPLES NUMERICAL
[6]  
[Anonymous], 1990, M 196 1988 LOS ANG C
[7]   Inverse Frequency Weighting of Fragments for Similarity-Based Virtual Screening [J].
Arif, Shereena M. ;
Holliday, John D. ;
Willett, Peter .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (08) :1340-1349
[8]   Analysis and use of fragment-occurrence data in similarity-based virtual screening [J].
Arif, Shereena M. ;
Holliday, John D. ;
Willett, Peter .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2009, 23 (09) :655-668
[9]   DARC SUBSTRUCTURE SEARCH SYSTEM - A NEW APPROACH TO CHEMICAL INFORMATION [J].
ATTIAS, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1983, 23 (03) :102-108
[10]  
Ballester PJ, 2007, J COMPUT CHEM, V28, P1711, DOI [10.1002/jcc.20681, 10.1002/JCC.20681]