Rational selection of training and test sets for the development of validated QSAR models

被引:583
作者
Golbraikh, A [1 ]
Shen, M [1 ]
Xiao, ZY [1 ]
Xiao, YD [1 ]
Lee, KH [1 ]
Tropsha, A [1 ]
机构
[1] Univ N Carolina, Sch Pharm, Div Med Chem & Nat Prod, Chapel Hill, NC 27599 USA
关键词
D O I
10.1023/A:1025386326946
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R-2 (q(2)) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q(2) for the training set and accuracy of prediction (R-2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.
引用
收藏
页码:241 / 253
页数:13
相关论文
共 35 条
[1]   A UNIFIED FRAMEWORK FOR USING NEURAL NETWORKS TO BUILD QSARS [J].
AJAY .
JOURNAL OF MEDICINAL CHEMISTRY, 1993, 36 (23) :3565-3571
[2]  
Belkina NV, 1998, VOP MED KHIM, V44, P464
[3]   Comparative three-dimensional quantitative structure-activity relationship study of safeners and herbicides [J].
Bordás, B ;
Kömíves, T ;
Szántó, Z ;
Lopata, A .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2000, 48 (03) :926-931
[4]  
Cho S.J., 1999, RATIONAL DRUG DESIGN, P198
[5]   Antitumor agents .163. Three-dimensional quantitative structure-activity relationship study of 4'-O-demethylepipodophyllotoxin analogs using the modified CoMFA/q(2)-GRS approach [J].
Cho, SJ ;
Tropsha, A ;
Suffness, M ;
Cheng, YC ;
Lee, KH .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (07) :1383-1395
[6]   Rational combinatorial library design. 2. Rational design of targeted combinatorial peptide libraries using chemical similarity probe and the inverse QSAR approaches [J].
Cho, SJ ;
Zheng, WF ;
Tropsha, A .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (02) :259-268
[7]  
CLEMENTI S, 1995, CHEMOMETRICS METHODS, P319
[8]   Quantitative structure-antitumor activity relationships of camptothecin analogues: Cluster analysis and genetic algorithm-based studies [J].
Fan, Y ;
Shi, LM ;
Kohn, KW ;
Pommier, Y ;
Weinstein, JN .
JOURNAL OF MEDICINAL CHEMISTRY, 2001, 44 (20) :3254-3263
[9]   Modeling antimalarial activity:: Application of kinetic energy density quantum similarity measures as descriptors in QSAR [J].
Gironés, X ;
Gallegos, A ;
Carbó-Dorca, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06) :1400-1407
[10]   Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection [J].
Golbraikh, A ;
Tropsha, A .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) :357-369