Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data

被引:283
作者
Rohrer, Sebastian G. [1 ]
Baumann, Knut [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Pharmaceut Chem, D-38106 Braunschweig, Germany
关键词
CAMD TECHNIQUE PERFORMANCE; CHEMICAL DATABASES; DOCKING PROGRAMS; CLUSTAL-W; CLASSIFICATION; SELECTION; DESIGN; DRUGS; DESCRIPTORS; ALIGNMENT;
D O I
10.1021/ci8002649
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
引用
收藏
页码:169 / 184
页数:16
相关论文
共 83 条
[1]  
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour
[2]  
[Anonymous], 2007, MOL OP ENV MOE 2007
[3]  
[Anonymous], NIH ROADM MED RES
[4]  
Atkinson A., 1992, Oxford Statistical Science Series, V8
[5]   Characterization of chemical libraries for luciferase inhibitory activity [J].
Auld, Douglas S. ;
Southall, Noel T. ;
Jadhav, Ajit ;
Johnson, Ronald L. ;
Diller, David J. ;
Simeonov, Anton ;
Austin, Christopher P. ;
Inglese, James .
JOURNAL OF MEDICINAL CHEMISTRY, 2008, 51 (08) :2372-2386
[6]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[7]   An alignment-independent versatile structure descriptor for QSAR and QSPR based on the distribution of molecular features [J].
Baumann, K .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (01) :26-35
[8]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[9]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[10]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242