Enhancing the effectiveness of virtual screening by fusing nearest neighbor lists: A comparison of similarity coefficients

被引:91
作者
Whittle, M
Gillet, VJ
Willett, P
Alex, A
Loesel, J
机构
[1] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S10 2TN, S Yorkshire, England
[2] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
[3] Pfizer Ltd, Pfizer Global Res & Dev, Sandwich CT13 9NJ, Kent, England
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 05期
关键词
D O I
10.1021/ci049867x
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper evaluates the effectiveness of various similarity coefficients for 2D similarity searching when multiple bioactive target structures are available. Similarity searches using several different activity classes within the MDL Drug Data Report and the Dictionary of Natural Products databases are performed using BCI 2D fingerprints. Using data fusion techniques to combine the resulting nearest neighbor lists we obtain group recall results which, in many cases, are a considerable improvement on standard average recall values obtained for individual structures. It is shown that the degree of improvement can be related to the structural diversity of the activity class that is searched for, the best results being found for the most diverse groups. The group recall of active compounds using subsets of the class is also investigated: for highly self-similar activity classes, the group recall improvement saturates well before the full activity class size is reached. A rough correlation is found between the relative improvement using the group recall and the square of the number of unique compounds available in all of the merged lists. The Tanimoto coefficient is found unambiguously to be the best coefficient to use for the recovery of active compounds using multiple targets. Furthermore, when using the Tanimoto coefficient, the "MAX" fusion rule is found to be more effective than the "SUM" rule for the combination of similarity searches from multiple targets. The use of group recall can lead to improved enrichment in database searches and virtual screening.
引用
收藏
页码:1840 / 1848
页数:9
相关论文
共 15 条
[1]   Data fusion by intelligent classifier combination [J].
Buxton, BF ;
Langdon, WB ;
Barrett, SJ .
MEASUREMENT & CONTROL, 2001, 34 (08) :229-234
[2]   Effectiveness of retrieval in similarity searches of chemical databases: A review of performance measures [J].
Edgar, SJ ;
Holliday, JD ;
Willett, P .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2000, 18 (4-5) :343-357
[3]   Combination of molecular similarity measures using data fusion [J].
Ginn, CMR ;
Willett, P ;
Bradshaw, J .
PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 2000, 20 (01) :1-16
[4]   Analysis and display of the size dependence of chemical similarity coefficients [J].
Holliday, JD ;
Salim, N ;
Whittle, M ;
Willett, P .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (03) :819-828
[5]  
Johnson M., 1990, CONCEPTS APPL MOL SI
[6]   The importance of scaling in data mining for toxicity prediction [J].
Mazzatorta, P ;
Benfenati, E ;
Neagu, D ;
Gini, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (05) :1250-1255
[7]  
Ng KB, 2000, J AM SOC INFORM SCI, V51, P1177, DOI 10.1002/1097-4571(2000)9999:9999<::AID-ASI1030>3.0.CO
[8]  
2-E
[9]   Combination of fingerprint-based similarity coefficients using data fusion [J].
Salim, N ;
Holliday, J ;
Willett, P .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :435-442
[10]   Similarity metrics for ligands reflecting the similarity of the target proteins [J].
Schuffenhauer, A ;
Floersheim, P ;
Acklin, P ;
Jacoby, E .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :391-405