Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method

被引:84
作者
Wegner, JK [1 ]
Zell, A [1 ]
机构
[1] Univ Tubingen, Zentrum Bioinformat Tubingen, D-72076 Tubingen, Germany
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2003年 / 43卷 / 03期
关键词
D O I
10.1021/ci034006u
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The paper describes a fast and flexible descriptor selection method using a genetic algorithm variant (GA-SEC). The relevance of the descriptors will be measured using Shannon entropy (SE) and differential Shannon entropy (DSE), which have very sparse memory requirements and allow the processing of huge data sets. A small quantity of the most important descriptors will be used automatically to build a value prediction model. The most important descriptors are not a linear combination of other descriptors, but transparent, pure descriptors. We used an artificial neural network (ANN) model to predict the aqueous solubility logS and the octanol/water partition coefficient logP. The logS data set was divided into a training set of 1016 compounds and a test set of 253 compounds. A correlation coefficient of 0.93 and an empirical standard deviation of 0.54 were achieved. The logP data set was divided into a training set of 1853 compounds and a test set of 138 compounds. A correlation coefficient of 0.92 and an empirical standard deviation of 0.44 were achieved.
引用
收藏
页码:1077 / 1084
页数:8
相关论文
共 51 条
[1]   Advances in diversity profiling and combinatorial series design [J].
Agrafiotis, DK ;
Myslik, JC ;
Salemme, FR .
MOLECULAR DIVERSITY, 1998, 4 (01) :1-22
[2]   On the use of information theory for assessing molecular diversity [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03) :576-580
[3]  
[Anonymous], 1998, P ENG INT SYST
[4]  
[Anonymous], P AAAI FALL S REL
[5]   WATER SOLUBILITY AND OCTANOL-WATER PARTITION-COEFFICIENTS OF ORGANICS - LIMITATIONS OF THE SOLUBILITY-PARTITION COEFFICIENT CORRELATION [J].
BANERJEE, S ;
YALKOWSKY, SH ;
VALVANI, SC .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 1980, 14 (10) :1227-1229
[6]  
Bomze I. M., 1999, HDB COMBINATORIAL OP, V4
[7]   FINDING ALL CLIQUES OF AN UNDIRECTED GRAPH [H] [J].
BRON, C ;
KERBOSCH, J .
COMMUNICATIONS OF THE ACM, 1973, 16 (09) :575-577
[8]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[9]   Designing combinatorial library mixtures using a genetic algorithm [J].
Brown, RD ;
Martin, YC .
JOURNAL OF MEDICINAL CHEMISTRY, 1997, 40 (15) :2304-2313
[10]  
*CHEM COMP GROUP I, 2002, MOE MOL OP ENV