A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations

被引:83
作者
Baumann, K [1 ]
Albert, H [1 ]
von Korff, M [1 ]
机构
[1] Univ Wurzburg, Dept Pharm, D-97074 Wurzburg, Germany
关键词
cross-validation; variable selection; PLS; PCR; tabu search;
D O I
10.1002/cem.730
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variable selection is an extensively studied problem in chemometrics and in the area of quantitative structure-activity relationships (QSARs). Many search algorithms have been compared so far. Less well studied is the influence-of different objective functions on the prediction quality of the selected models. This paper investigates the performance of different cross-validation techniques as objective function for variable selection in latent variable regression. The results are compared in terms of predictive ability, model size (number of variables) and model complexity (number of latent variables). It will be shown that leave-multiple-out cross-validation with a large percentage of data left out performs best. Since leave-multiple-out cross-validation is computationally expensive, a very efficient tabu search algorithm is introduced to lower the computational burden. The tabu search algorithm needs no user-defined operational parameters and optimizes the variable subset and the number of latent variables simultaneously. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:339 / 350
页数:12
相关论文
共 68 条