Similarity searching

被引:102
作者
Stumpfe, Dagmar [1 ]
Bajorath, Juergen [1 ]
机构
[1] Univ Bonn, Dept Life Sci Informat, B IT, D-5300 Bonn, Germany
关键词
MOLECULAR SIMILARITY; CHEMICAL SIMILARITY; ACTIVE COMPOUNDS; DATA FUSION; STRUCTURAL DESCRIPTORS; NEIGHBORHOOD BEHAVIOR; PROPERTY DESCRIPTORS; GAUSSIAN DESCRIPTION; INFORMATION-CONTENT; DIVERGENCE ANALYSIS;
D O I
10.1002/wcms.23
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Similarity searching is one of the traditional and most widely applied approaches in chemical and pharmaceutical research to select compounds with desired properties from databases. The computational efficiency of many (but not all) similarity search techniques has further increased their popularity as compound databases began to rapidly grow in size. Different methods have been developed for small molecule similarity searching. However, foundations and intrinsic limitations of similarity searching are often not well understood, although a number of similarity methods are rather simplistic. Regardless of methodological details, all similarity search approaches depend on how molecular similarity is evaluated and quantified. In its essence, molecular similarity is a subjective concept and much dependent on how we represent and view molecular structures. Moreover, trying to understand the relationship between molecular similarity, however assessed, and structure-dependent properties including, first and foremost, biological activity continues to be a challenging problem. Consequently, although similarity searching usually provides a quantitative readout and a ranking of compounds relative to chosen reference molecules, predicting structure-activity relationships on the basis of calculated similarity values often involves subjective criteria and chemical intuition. Thus, similarity searching is still far from being a routine application in database mining. In this review, we first discuss important principles underlying similarity searching, describe its tasks, and introduce major categories of search methods. Then, we focus on molecular fingerprints, the design and application of which can be regarded as a paradigm for the similarity search field. (C) 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 260-282 DOI: 10.1002/wcms.23
引用
收藏
页码:260 / 282
页数:23
相关论文
共 119 条
[11]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[12]   Molecular similarity: a key technique in molecular informatics [J].
Bender, A ;
Glen, RC .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) :3204-3218
[13]   Discussion of measures of enrichment in virtual screening: Comparing the information content of descriptors with increasing levels of sophistication [J].
Bender, A ;
Glen, RC .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (05) :1369-1375
[14]   Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :170-178
[15]   A NOTE ON ERROR DETECTION CODES FOR ASYMMETRIC CHANNELS [J].
BERGER, JM .
INFORMATION AND CONTROL, 1961, 4 (01) :68-&
[16]   A rapid computational method for lead evolution:: Description and application to α1-adrenergic antagonists [J].
Bradley, EK ;
Beroza, P ;
Penzotti, JE ;
Grootenhuis, PDJ ;
Spellmeyer, DC ;
Miller, JL .
JOURNAL OF MEDICINAL CHEMISTRY, 2000, 43 (14) :2770-2774
[17]   An evaluation of structural descriptors and clustering methods for use in diversity selection [J].
Brown, RD ;
Martin, YC .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 1998, 8 (1-2) :23-39
[18]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[19]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[20]   Molecular field extrema as descriptors of biological activity: Definition and validation [J].
Cheeseright, T ;
Mackey, M ;
Rose, S ;
Vinter, A .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :665-676