QSAR - How good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets

被引:132
作者
Gedeck, Peter
Rohde, Bernhard
Bartels, Christian
机构
[1] Novartis Horsham Res Ctr, Novartis Inst BioMed Res, Horsham RH12 5AB, W Sussex, England
[2] Novartis Inst BioMed Res, CH-4002 Basel, Switzerland
关键词
D O I
10.1021/ci050413p
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The quality of QSAR (Quantitative Structure-Activity Relationships) predictions depends on a large number of factors including the descriptor set, the statistical method, and the data sets used. Here we study the quality of QSAR predictions mainly as a function of the data set and descriptor type using partial least squares as the statistical modeling method. The study makes use of the fact that we have access to a large number of data sets and to a variety of different QSAR descriptors. The main conclusions are that the quality of the predictions depends both on the data set and the descriptor used. The quality of the predictions correlates positively with the size of the data set and the range of biological activities. There is no clear dependence of the quality of the predictions on the complexity of the data set. All of the descriptors tested produced useful predictions for some of the data sets. None of the descriptors is best for all data sets; it is therefore necessary to test in each individual case, which descriptor produces the best model. In our tests, 2D fragment based descriptors usually performed better than simpler descriptors based on augmented atom types. Possible reasons for these observations are discussed.
引用
收藏
页码:1924 / 1936
页数:13
相关论文
共 29 条
[1]   Characterization of flexible molecules in solution: the RGDW peptide [J].
Bartels, C ;
Stote, RH ;
Karplus, M .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (05) :1641-1660
[2]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[3]   Robust QSAR models using Bayesian regularized neural networks [J].
Burden, FR ;
Winkler, DA .
JOURNAL OF MEDICINAL CHEMISTRY, 1999, 42 (16) :3183-3187
[4]   SAMPLE-DISTANCE PARTIAL LEAST-SQUARES - PLS OPTIMIZED FOR MANY VARIABLES, WITH APPLICATION TO COMFA [J].
BUSH, BL ;
NACHBAR, RB .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1993, 7 (05) :587-619
[5]   LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING [J].
CLEVELAND, WS ;
DEVLIN, SJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) :596-610
[6]   Reoptimization of MDL keys for use in drug discovery [J].
Durant, JL ;
Leland, BA ;
Henry, DR ;
Nourse, JG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06) :1273-1280
[7]  
GHOSE A, 1986, J COMPUT CHEM, V4, P565
[8]   Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: An analysis of ALOGP and CLOGP methods [J].
Ghose, AK ;
Viswanadhan, VN ;
Wendoloski, JJ .
JOURNAL OF PHYSICAL CHEMISTRY A, 1998, 102 (21) :3762-3772
[9]   Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection [J].
Golbraikh, A ;
Tropsha, A .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) :357-369
[10]   Assessing model fit by cross-validation [J].
Hawkins, DM ;
Basak, SC ;
Mills, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :579-586