Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys

被引:60
作者
Xue, L
Godden, JW
Stahura, FL
Bajorath, J
机构
[1] Albany Mol Res Inc, Dept Comp Aided Drug Discovery, Bothell Res Ctr, Bothell, WA 98011 USA
[2] Univ Washington, Dept Biol Struct, Seattle, WA 98195 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2003年 / 43卷 / 04期
关键词
D O I
10.1021/ci030287u
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searching is discussed and applied to newly designed fingerprint representations. The approach is based on the analysis of characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biological activity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weighted according to observed bit frequencies are applied to signature bit positions when searching for similar compounds. In systematic similarity search calculations over 23 diverse activity classes, profile scaling consistently increased the performance of fingerprints containing property descriptors and/or structural keys. A significant improvement of similar to15% was observed for a new fingerprint consisting of binary encoded molecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few false positives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimum performance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficient than in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively high weight on signature bit positions that were always, or almost always, set on was found to be the most effective scaling procedure. Analysis of class-specific search performance revealed that profile scaling of MP-MFP improved the similarity search results for each of the 23 activity classes.
引用
收藏
页码:1218 / 1225
页数:8
相关论文
共 23 条
[1]  
[Anonymous], MOL OP ENV
[2]  
[Anonymous], MACCS KEYS
[3]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[4]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[5]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[6]   Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations [J].
Godden, JW ;
Stahura, FL ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (03) :796-800
[7]   Chemical descriptors with distinct levels of information content and varying sensitivity to differences between selected compound databases identified by SE-DSE analysis [J].
Godden, JW ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (01) :87-93
[8]   Median partitioning: A novel method for the selection of representative subsets from large compound pools [J].
Godden, JW ;
Xue, L ;
Kitchen, DB ;
Stahura, FL ;
Schermerhorn, EJ ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (04) :885-893
[9]  
GODDEN JW, 2000, PAC S BIOCOMPUT, V5, P566
[10]  
Holliday JD, 2002, COMB CHEM HIGH T SCR, V5, P155