Reliability of logP predictions based on calculated molecular descriptors:: A critical review

被引:76
作者
Erös, D
Kövesdi, I
Örfi, L
Takács-Novák, K
Acsády, G
Kéri, G
机构
[1] Semmelweis Univ, Dept Pharmaceut Chem, Cooperat Res Ctr, H-1092 Budapest, Hungary
[2] Semmelweis Univ, Dept Med Chem, Peptide Biochem Res Grp, H-1088 Budapest, Hungary
[3] Vichem Chem Ltd, H-1022 Budapest, Hungary
[4] Semmelweis Univ, Dept Cardiovasc Surg, H-1122 Budapest, Hungary
关键词
logP prediction; drugs; neural network; lipophilicity; in-silico-screening; combinatorial libraries;
D O I
10.2174/0929867023369042
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correct QSAR analysis requires reliable measured or calculated logP values, being logP the most frequently utilized and most important physico-chemical parameter in such studies. Since the publication of theoretical fundamentals of logP prediction, many commercial software solutions are available. These programs are all based on experimental data of huge databases therefore the predicted logP values are mostly acceptable especially for known structures and their derivatives. In this study we critically reviewed the published methods and compared the predictive power of commercial softwares (CLOGP, KOWWIN, SciLogP/ULTRA) to each other and to our recently developed automatic QS(P)AR program. We have selected a very diverse set of 625 known drugs (98%) and drug-like molecules with experimentally validated logP values. We have collected 78 reported "outliers" as well, which could not be predicted by the "traditional" methods. We used these data in the model buildings and validations. Finally, we used an external validation set of compounds missing from public databases. We emphasized the importance of data quality, descriptor calculation and selection, and presented a general, reliable descriptor selection and validation technique for such kind of studies. Our method is based on the strictest mathematical and statistical rules, fully automatic and after the initial settings there is no option for user intervention. Three approaches were applied: multiple linear regression, partial least squares analysis and artificial neural network. LogP predictions with a multiple linear regression model showed acceptable accuracy for new compounds therefore it can be used for "in-silico-screening" and/or planning virtual/combinatorial libraries.
引用
收藏
页码:1819 / 1829
页数:11
相关论文
共 48 条
[11]  
Cronce DT, 1998, J CHEM SOC PERK T 2, P1293
[12]   RELATIONSHIPS BETWEEN OCTANOL WATER PARTITION-COEFFICIENTS AND TOTAL MOLECULAR-SURFACE AREA AND TOTAL MOLECULAR VOLUME OF HYDROPHOBIC ORGANIC-CHEMICALS [J].
DEBRUIJN, J ;
HERMENS, J .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1990, 9 (01) :11-21
[13]  
FEDORS RF, 1986, CRC HDB SOLUBILITY P
[14]   NEW SUBSTITUENT CONSTANT PI DERIVED FROM PARTITION COEFFICIENTS [J].
FUJITA, T ;
HANSCH, C ;
IWASA, J .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1964, 86 (23) :5175-&
[15]   MOLECULAR LIPOPHILICITY POTENTIAL, A TOOL IN 3D QSAR - METHOD AND APPLICATIONS [J].
GAILLARD, P ;
CARRUPT, PA ;
TESTA, B ;
BOUDON, A .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1994, 8 (02) :83-96
[16]   RHO-SIGMA-PI ANALYSIS . METHOD FOR CORRELATION OF BIOLOGICAL ACTIVITY + CHEMICAL STRUCTURE [J].
HANSCH, C ;
FUJITA, T .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1964, 86 (08) :1616-&
[17]  
Hansch C, 1985, MEDCHEM PROJECT
[18]  
Hansch C., 1979, Substituent constants for correlation analysis in chemistry and biology
[19]  
Hansch C., 1995, ACS Professional Reference Book
[20]   MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS [J].
HORNIK, K ;
STINCHCOMBE, M ;
WHITE, H .
NEURAL NETWORKS, 1989, 2 (05) :359-366