Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices

被引:346
作者
Tetko, IV
Tanchuk, VY
Villa, AEP
机构
[1] Univ Lausanne, Inst Physiol, Lab Neuro Heurist, CH-1005 Lausanne, Switzerland
[2] Inst Bioorgan & Petr Chem, Dept Biomed, UA-253660 Kiev, Ukraine
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2001年 / 41卷 / 05期
关键词
D O I
10.1021/ci010368v
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A new method, ALOGPS v 2.0 (http://www.lnh.unil.ch/similar to itetko/logp/), for the assessment of n-octanol/ water partition coefficient, log P, was developed on the basis of neural network ensemble analysis of 12 908 organic compounds available from PHYSPROP database of Syracuse Research Corporation. The atom and bond-type E-state indices as well as the number of hydrogen and non-hydrogen atoms were used to represent the molecular structures. A preliminary selection of indices was performed by multiple linear regression analysis, and 75 input parameters were chosen. Some of the parameters combined several atom-type or bond-type indices with similar physicochemical properties. The neural network ensemble training was performed by efficient partition algorithm developed by the authors. The ensemble contained 50 neural networks, and each neural network had 10 neurons in one hidden layer. The prediction ability of the developed approach was estimated using both leave-one-out (LOO) technique and training/test protocol. In case of interseries predictions, i.e., when molecules in the test and in the training subsets were selected by chance from the same set of compounds, both approaches provided similar results. ALOGPS performance was significantly better than the results obtained by other tested methods. For a subset of 12 777 molecules the LOO results, namely correlation coefficient r(2) = 0.95, root mean squared error, RMSE = 0.39, and an absolute mean error, MAE = 0.29, were calculated. For two cross-series predictions, i.e., when molecules in the training and in the test sets belong to different series of compounds, all analyzed methods performed less efficiently. The decrease in the performance could be explained by a different diversity of molecules in the training and in the test sets. However, even for such difficult cases the ALOGPS method provided better prediction ability than the other tested methods. We have shown that the diversity of the training sets rather than the design of the methods is the main factor determining their prediction ability for new data. A comparative performance of the methods as well as a dependence on the number of non-hydrogen atoms in a molecule is also presented.
引用
收藏
页码:1407 / 1421
页数:15
相关论文
共 53 条
[1]   Model selection in neural networks [J].
Anders, U ;
Korn, O .
NEURAL NETWORKS, 1999, 12 (02) :309-323
[2]  
[Anonymous], SYB
[3]   AN EXTENDED VERSION OF A NOVEL METHOD FOR THE ESTIMATION OF PARTITION-COEFFICIENTS [J].
BODOR, N ;
HUANG, MJ .
JOURNAL OF PHARMACEUTICAL SCIENCES, 1992, 81 (03) :272-281
[4]   Molecular size based approach to estimate partition properties for organic solutes [J].
Bodor, N ;
Buchwald, P .
JOURNAL OF PHYSICAL CHEMISTRY B, 1997, 101 (17) :3404-3412
[5]   Prediction of the n-octanol/water partition coefficient, logP, using a combination of semiempirical MO-calculations and a neural network [J].
Breindl, A ;
Beck, B ;
Clark, T ;
Glen, RC .
JOURNAL OF MOLECULAR MODELING, 1997, 3 (03) :142-155
[6]  
BROTO P, 1984, EUR J MED CHEM, V19, P71
[7]  
Buchwald P, 1998, CURR MED CHEM, V5, P353
[8]  
Devillers James., 1996, Neural networks in QSAR and drug design
[9]   NEURAL NETWORKS AND THE BIAS VARIANCE DILEMMA [J].
GEMAN, S ;
BIENENSTOCK, E ;
DOURSAT, R .
NEURAL COMPUTATION, 1992, 4 (01) :1-58
[10]   Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: An analysis of ALOGP and CLOGP methods [J].
Ghose, AK ;
Viswanadhan, VN ;
Wendoloski, JJ .
JOURNAL OF PHYSICAL CHEMISTRY A, 1998, 102 (21) :3762-3772