VARIABLE SELECTION IN QSAR STUDIES .2. A HIGHLY EFFICIENT COMBINATION OF SYSTEMATIC SEARCH AND EVOLUTION

被引:183
作者
KUBINYI, H
机构
[1] Basf Ag, Ludwigshafen
来源
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS | 1994年 / 13卷 / 04期
关键词
CROSS-VALIDATION; EVOLUTIONARY ALGORITHM FOR VARIABLE SELECTION; GENETIC ALGORITHM; MUSEUM APPROACH FOR VARIABLE SELECTION; PRESELECTION OF VARIABLES IN REGRESSION ANALYSIS; REGRESSION ANALYSIS; SYSTEMATIC SEARCH OF REGRESSION MODELS; VARIABLE SELECTION IN OSAR STUDIES;
D O I
10.1002/qsar.19940130403
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Recently two evolutionary strategies for the derivation of regression models, a genetic function approximation and the mutation/selection algorithm MUSEUM have been described. The MUSEUM (Mutation and Selection Uncover Models) algorithm starts from a model containing randomly chosen variables. Random mutation, first by addition or elimination of only one or very few variables, afterwards by simultaneous random additions, eliminations and/or exchanges of several variables at a time, leads to new models which are evaluated by an appropriate fitness function. Only the ''fittest'' model is stored and used for further mutation and selection, leading to better and better models. However, the fitness of all models with up to three X variables can be determined much faster by calculation of the correlation coefficients r(y.ij) and r(y.ijk) from the partial correlation coefficients r(yi), r(ij), r(yj.i), r(jk.i) and r(yk.ij). Using the Selwood data set (n = 31 compounds, k = 53 variables), it is demonstrated that systematic search is the best strategy for regression models with two or three X variables. The variables contained in the best three-variable models can be selected for further investigation, using the evolutionary approach. With the exception of complex models, containing six and more variables, nearly all relevant regression models an found by this combination of systematic search with the mutation/selection algorithm MUSEUM; the results are obtained in considerably shorter time than by including all variables in the calculations. In addition, systematic search is also a valuable tool for variable selection prior to stepwise regression and PLS analyses.
引用
收藏
页码:393 / 401
页数:9
相关论文
共 12 条
[1]  
DRAPER NR, 1981, APPLIED REGRESSION A
[2]   VARIABLE SELECTION IN QSAR STUDIES .1. AN EVOLUTIONARY ALGORITHM [J].
KUBINYI, H .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1994, 13 (03) :285-294
[3]  
KUBINYI H, 1993, 3D QSAR DRUG DESIGN, P717
[4]  
KUBINYI H, UNPUB
[5]   ON IDENTIFYING LIKELY DETERMINANTS OF BIOLOGICAL-ACTIVITY IN HIGH-DIMENSIONAL QSAR PROBLEMS [J].
MCFARLAND, JW ;
GANS, DJ .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1994, 13 (01) :11-17
[6]   APPLICATION OF GENETIC FUNCTION APPROXIMATION TO QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS AND QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIPS [J].
ROGERS, D ;
HOPFINGER, AJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (04) :854-866
[7]  
SACHS L, 1992, ANGEWANDTE STATISTIK
[8]   STRUCTURE-ACTIVITY-RELATIONSHIPS OF ANTIFILARIAL ANTIMYCIN ANALOGS - A MULTIVARIATE PATTERN-RECOGNITION STUDY [J].
SELWOOD, DL ;
LIVINGSTONE, DJ ;
COMLEY, JCW ;
ODOWD, AB ;
HUDSON, AT ;
JACKSON, P ;
JANDU, KS ;
ROSE, VS ;
STABLES, JN .
JOURNAL OF MEDICINAL CHEMISTRY, 1990, 33 (01) :136-142
[9]   CHANCE CORRELATIONS IN STRUCTURE-ACTIVITY STUDIES USING MULTIPLE REGRESSION-ANALYSIS [J].
TOPLISS, JG ;
COSTELLO, RJ .
JOURNAL OF MEDICINAL CHEMISTRY, 1972, 15 (10) :1066-&
[10]   CHANCE FACTORS IN STUDIES OF QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS [J].
TOPLISS, JG ;
EDWARDS, RP .
JOURNAL OF MEDICINAL CHEMISTRY, 1979, 22 (10) :1238-1244