Unsupervised forward selection: A method for eliminating redundant variables

被引:148
作者
Whitley, DC
Ford, MG
Livingstone, DJ
机构
[1] Univ Portsmouth, Inst Biomed & Biomol Sci, Ctr Mol Design, Portsmouth PO1 2DY, Hants, England
[2] ChemQuest, Sandown PO36 8LZ, Isle Wight, England
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2000年 / 40卷 / 05期
关键词
D O I
10.1021/ci000384c
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
An unsupervised learning method is proposed for variable selection and its performance assessed using three typical QSAR data sets; The aims of this procedure are to generate a subset Of descriptors from any given data set in which the resultant variables are relevant,redundancy is eliminated, and multicollinearity is reduced. Continuum regression, an algorithm encompassing ordinary least squares regression, regression on principal components, and partial least squares regression, was used to construct models from the selected variables: The variable selection routine is shown to produce simple, robust, and easily interpreted models for the chosen data sets.
引用
收藏
页码:1160 / 1168
页数:9
相关论文
共 32 条
[1]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[2]  
Efron B., 1993, INTRO BOOTSTRAP, V1st ed., DOI DOI 10.1201/9780429246593
[3]   THE STRUCTURE ACTIVITY RELATIONSHIPS OF PYRETHROID INSECTICIDES .1. A NOVEL-APPROACH BASED UPON THE USE OF MULTIVARIATE QSAR AND COMPUTATIONAL CHEMISTRY [J].
FORD, MG ;
GREENWOOD, R ;
TURNER, CH ;
HUDSON, B ;
LIVINGSTONE, DJ .
PESTICIDE SCIENCE, 1989, 27 (03) :305-326
[4]  
FORINA M, 1991, QSAR RATIONAL APPROA, P181
[5]  
GLEN RC, 1987, J MOL GRAPHICS, V5, P79
[6]  
Gute B D, 1997, SAR QSAR Environ Res, V7, P117, DOI 10.1080/10629369708039127
[7]   SYSTEMATIC QSAR PROCEDURES WITH QUANTUM CHEMICAL DESCRIPTORS [J].
KIKUCHI, O .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1987, 6 (04) :179-184
[8]  
Kovalishyn VV, 2000, MOLECULAR MODELING AND PREDICTION OF BIOACTIVITY, P444
[9]   Neural network studies. 3. Variable selection in the cascade-correlation learning architecture [J].
Kovalishyn, VV ;
Tetko, IV ;
Luik, AI ;
Kholodovych, VV ;
Villa, AEP ;
Livingstone, DJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (04) :651-659
[10]  
Kubinyi H, 1996, J CHEMOMETR, V10, P119