Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models

被引:130
作者
Horvath, Dragos [1 ]
Marcou, Gilles [1 ]
Alexandre, Varnek [1 ]
机构
[1] Univ Strasbourg, Lab InfoChime, CNRS, UMR 7177,Inst Chim, F-67000 Strasbourg, France
关键词
TRICENTRIC PHARMACOPHORE FINGERPRINTS; SILICO STRUCTURAL SPACES; VITRO ACTIVITY SPACES; QUANTITATIVE STRUCTURE; NEIGHBORHOOD BEHAVIOR; THEORETICAL DESCRIPTORS; QSPR MODELS; VALIDATION; CANDIDATES; FRAGMENTS;
D O I
10.1021/ci9000579
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The present work proposes a unified conceptual framework to describe and quantify the important issue of the Applicability Domains (AD) of Quantitative Structure-Activity Relationships (QSARs). AD models are conceived as meta-models mu mu designed to associate an untrustworthiness score to any molecule M subject to property prediction by a QSAR model mu. Untrustworthiness scores or "AD metrics" psi(mu)(M) are an expression of the relationship between V (represented by its descriptors in chemical space) and the space zones populated by the training molecules at the basis of model mu. Scores integrating some of the classical AD criteria (similarity-based, box-based) were considered in addition to newly invented terms such as the consensus prediction variance, the dissimilarity to outlier-free training sets, and the correlation breakdown count (the former two being most successful). A loose correlation is expected to exist between this untrustworthiness and the error vertical bar P-mu(M)-P-expt(M)vertical bar affecting the property P-mu(M) predicted by mu. While high untrustworthiness does not preclude correct predictions, inaccurate predictions at low untrustworthiness must be imperatively avoided. This kind of relationship is characteristic for the Neighborhood Behavior (NB) problem: dissimilar molecule pairs may or may not display similar properties, but similar molecule pairs with different properties are explicitly "forbidden". Therefore, statistical tools developed to tackle this latter aspect were applied and lead to a unified AD metric henchmarking scheme. A first use of untrustworthiness scores resides in prioritization of predictions, without the need to specify a hard AD border. Moreover, if a significant set of external compounds is available, the formalism allows optimal AD borderlines to be fitted. Eventually, consensus AD definitions were built by means of a nonparametric mixing scheme of two AD metrics of comparable quality and shown to outperform their respective parents.
引用
收藏
页码:1762 / 1776
页数:15
相关论文
共 38 条
[1]  
[Anonymous], CHEMAXON SCREEN US G
[2]  
[Anonymous], CHEMAXON PKA CALC PL
[3]   Fuzzy tricentric pharmacophore fingerprints.: 2.: application of topological fuzzy pharmacophore triplets in quantitative structure-activity relationships [J].
Bonachera, Fanny ;
Horvath, Dragos .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (02) :409-425
[4]   Fuzzy tricentric pharmacophore fingerprints.: 1.: Topological fuzzy pharmacophore triplets and adapted molecular similarity scoring schemes [J].
Bonachera, Fanny ;
Parent, Benjamin ;
Barbosa, Frederique ;
Froloff, Nicolas ;
Horvath, Dragos .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) :2457-2477
[5]   logD7.4 modeling using Bayesian regularized neural networks.: Assessment and correction of the errors of prediction [J].
Bruneau, Pierre ;
McElroy, Nathan R. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (03) :1379-1387
[6]  
Duda R., 1973, Pattern classification and scene analysis, P457
[7]  
FUKUMIZU K, 1993, P INT JOINT C NEUR N, P1727
[8]   Neighborhood Behavior of in silico structural spaces with respect to in vitro activity spaces - A benchmark for Neighborhood Behavior assessment of different in silico similarity metrics [J].
Horvath, D ;
Jeandenans, C .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :691-698
[9]   Neighborhood behavior of in silico structural spaces with respect to in vitro activity spaces - A novel understanding of the molecular similarity principle in the context of multiple receptor binding profiles [J].
Horvath, D ;
Jeandenans, C .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :680-690
[10]  
HORVATH D, 2004, CHEMOINFORMATICS DRU, P117