Genetic algorithm guided selection: Variable selection and subset selection

被引:117
作者
Cho, SJ
Hermsmeier, MA
机构
[1] Bristol Myers Squibb Co, New Leads, Wallingford, CT 06492 USA
[2] Bristol Myers Squibb Co, New Leads, Princeton, NJ 08543 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 04期
关键词
D O I
10.1021/ci010247v
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A novel Genetic Algorithm guided Selection method, GAS, has been described. The method utilizes a simple encoding scheme which can represent both compounds and variables used to construct a QSAR/QSPR model. A genetic algorithm is then utilized to simultaneously optimize the encoded variables that include both descriptors and compound subsets. The GAS method generates multiple models each applying to a subset of the compounds. Typically the subsets represent clusters with different chemotypes. Also a procedure based on molecular similarity is presented to determine which model should be applied to a given test set compound. The variable selection method implemented in GAS has been tested and compared using the Selwood data set (n = 31 compounds; v = 53 descriptors). The results showed that the method is comparable to other published methods. The subset selection method implemented in GAS has been first tested using an artificial data set (n = 100 points; v = 1 descriptor) to examine its ability to subset data points and second applied to analyze the XLOGP data set (n = 1831 compounds; v = 126 descriptors). The method is able to correctly identify artificial data points belonging to various subsets. The analysis of the XLOGP data set shows that the subset selection method can be useful in improving a QSAR/QSPR model when the variable selection method fails.
引用
收藏
页码:927 / 936
页数:10
相关论文
共 23 条
  • [21] A new atom-additive method for calculating partition coefficients
    Wang, RX
    Fu, Y
    Lai, LH
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03): : 615 - 621
  • [22] WEIFAN Z, 2000, J CHEM INF COMP SCI, V40, P185
  • [23] THE USE OF NEURAL NETWORKS FOR VARIABLE SELECTION IN QSAR
    WIKEL, JH
    DOW, ER
    [J]. BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, 1993, 3 (04) : 645 - 651