Performance of some variable selection methods when multicollinearity is present

被引:1564
作者
Chong, IG [1 ]
Jun, CH [1 ]
机构
[1] Pohang Univ Sci & Technol, Dept Ind Engn, Pohang 790784, South Korea
关键词
variable selection; VIP (Variable importance in the projection) scores; partial least squares regression; the lasso; stepwise regression; multicollinearity;
D O I
10.1016/j.chemolab.2004.12.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variable selection is one of the important practical issues for many scientific engineers. Although the PLS (partial least squares) regression combined with the VIP (variable importance in the projection) scores is often used when the multicollinearity, is present among variables, there are few guidelines about its uses as well as its performance. The purpose of this paper is to explore the nature of the VIP method and to compare with other methods through computer simulation experiments. We design 108 experiments where observations are generated from true models considering four factors-the proportion of the number of relevant predictors, the magnitude of correlations between predictors, the structure of regression coefficients, and the magnitude of signal to noise. Confusion matrix is adopted to evaluate the performance of PLS, the Lasso, and stepwise method. We also discuss the proper cutoff value of the VIP method to increase its performance. Some practical hints for the use of the VIP method are given as simulation results. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:103 / 112
页数:10
相关论文
共 12 条
[11]   PLS-regression:: a basic tool of chemometrics [J].
Wold, S ;
Sjöström, M ;
Eriksson, L .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 58 (02) :109-130
[12]  
Wold S., 1993, 3D QSAR DRUG DESIGN, P523, DOI DOI 10.1007/0-306-46858-1