Development and validation of a novel variable selection technique with application to multidimensional quantitative structure-activity relationship studies

被引:91
作者
Waller, CL
Bradley, MP
机构
[1] OSI Pharmaceut Inc, Durham, NC 27707 USA
[2] Rhone Poulenc Agro, Res Triangle Pk, NC 27709 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 02期
关键词
D O I
10.1021/ci980405r
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Variable selection is typically a time-consuming and ambiguous procedure in performing quantitative structure-activity relationship (QSAR) studies on overdetermined (regressor-heavy) data sets. A variety of techniques including stepwise and partial least squares/principlal components analysis (PLS/PCA) regression have been applied to this common problem. Other strategies, such as neural networks, cluster significance analysis, nearest neighbor, or genetic (function) or evolutionary algorithms have also evaluated. A simple random selection strategy that implements iterative generation of models, but directly avoids crossover and mutation, has been developed and is implemented herein to rapidly identify from a pool of allowable variables those which are most closely associated with a given response variable. The FRED (fast random elimination of descriptors) algorithm begins with a population of offspring models composed of either a fixed or variable number of randomly selected variables. Iterative elimination of descriptors leads naturally to subsequent generations of more fit offspring models, In contrast to common genetic and evolutionary algorithms, only those descriptors determined to contribute to the genetic makeup of less fit offspring models are eliminated from the descriptor pool. After every generation, a new random increment line search of the remaining descriptors initiates the development of the next generation of:randomly constructed models. An optional algorithm that eliminates highly correlated descriptors in a stepwise manner prior to the development of the first generation of offspring greatly enhances the efficiency of the FRED algorithm. A comparison of the results of a FRED analysis of the Selwood data set (n = 31 compounds, k = 53 descriptors) with those obtained from alternative algorithms reveals that this technique is capable of identifying the same "optimal" solutions in an efficient manner.
引用
收藏
页码:345 / 355
页数:11
相关论文
共 16 条
[1]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[2]   TABOO SEARCH - AN APPROACH TO THE MULTIPLE MINIMA PROBLEM [J].
CVIJOVIC, D ;
KLINOWSKI, J .
SCIENCE, 1995, 267 (5198) :664-666
[3]   THE PHYSICOCHEMICAL APPROACH TO DRUG DESIGN AND DISCOVERY (QSAR) [J].
HANSCH, C .
DRUG DEVELOPMENT RESEARCH, 1981, 1 (04) :267-309
[4]  
Kier L.B., 1986, Molecular Connectivity in Structure-Activity Analysis
[5]   VARIABLE SELECTION IN QSAR STUDIES .1. AN EVOLUTIONARY ALGORITHM [J].
KUBINYI, H .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1994, 13 (03) :285-294
[6]   VARIABLE SELECTION IN QSAR STUDIES .2. A HIGHLY EFFICIENT COMBINATION OF SYSTEMATIC SEARCH AND EVOLUTION [J].
KUBINYI, H .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1994, 13 (04) :393-401
[7]   ON IDENTIFYING LIKELY DETERMINANTS OF BIOLOGICAL-ACTIVITY IN HIGH-DIMENSIONAL QSAR PROBLEMS [J].
MCFARLAND, JW ;
GANS, DJ .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1994, 13 (01) :11-17
[8]  
*MDL INF SYST INC, 1998, ISIS VERS 2 1
[9]   APPLICATION OF GENETIC FUNCTION APPROXIMATION TO QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS AND QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIPS [J].
ROGERS, D ;
HOPFINGER, AJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (04) :854-866
[10]  
ROGERS D, 1994, GENETIC FUNCTION APP