A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm

被引:266
作者
Galvao, Roberto Kawakami Harrop [2 ]
Ugulino Araujo, Mario Cesar [1 ]
Fragoso, Wallace Duarte [1 ]
Silva, Edvan Cirino [1 ]
Jose, Gledson Emidio [1 ]
Carreiro Soares, Sofacles Figueredo [1 ]
Paiva, Henrique Mohallem [3 ]
机构
[1] Univ Fed Paraiba, CCEN, Dept Quim, BR-58051970 Joao Pessoa, Paraiba, Brazil
[2] Inst Tecnol Aeronaut, Div Engn Eletron, BR-12228900 Sao Jose Dos Campos, SP, Brazil
[3] EMBRAER, BR-12227901 Sao Jose Dos Campos, SP, Brazil
关键词
multiple linear regression; variable selection; successive projections algorithm; near-infrared spectrometry; diesel analysis; com analysis;
D O I
10.1016/j.chemolab.2007.12.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The successive projections algorithm (SPA) is a variable selection technique designed to minimize collinearity problems in multiple linear regression (MLR). This paper proposes a modification to the basic SPA formulation aimed at further improving the parsimony of the resulting MLR model. For this purpose, an elimination procedure is incorporated to the algorithm in order to remove variables that do not effectively contribute towards the prediction ability of the model as indicated by an F-test. The utility of the proposed modification is illustrated in a simulation study, as well as in two application examples involving the analysis of diesel and com samples by near-infrared (NIR) spectroscopy. The results demonstrate that the number of variables selected by SPA can be reduced without significantly compromising prediction performance. In addition, SPA is favourably compared with classic Stepwise Regression and full-spectrum PLS. A graphical user interface for SPA is available at www.ele.ita.br/similar to kawakami/spa/. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:83 / 91
页数:9
相关论文
共 17 条
[1]   Application of radial basis function networks and successive projections algorithm in a QSAR study of anti-HIV activity for a large group of HEPT derivatives [J].
Akhlaghi, Yousef ;
Kompany-Zareh, Mohsen .
JOURNAL OF CHEMOMETRICS, 2006, 20 (1-2) :1-12
[2]   FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY [J].
ALTMAN, EI .
JOURNAL OF FINANCE, 1968, 23 (04) :589-609
[3]  
[Anonymous], 1998, Applied regression analysis, DOI 10.1002/9781118625590
[4]  
[Anonymous], 1998, Chemometrics: A Practical Guide
[5]   The successive projections algorithm for variable selection in spectroscopic multicomponent analysis [J].
Araújo, MCU ;
Saldanha, TCB ;
Galvao, RKH ;
Yoneyama, T ;
Chame, HC ;
Visani, V .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 57 (02) :65-73
[6]   Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration [J].
Breitkreitz, MC ;
Raimundo, IM ;
Rohwedder, JJR ;
Pasquini, C ;
Dantas, HA ;
José, GE ;
Araújo, MCU .
ANALYST, 2003, 128 (09) :1204-1207
[7]   Performance of some variable selection methods when multicollinearity is present [J].
Chong, IG ;
Jun, CH .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) :103-112
[8]   Simultaneous spectrometric determination of Cu2+, Mn2+ and Zn2+ in polivitaminic/polimineral drug using SPA and GA algorithms for variable selection [J].
Dantas, HA ;
de Souza, ESON ;
Visani, V ;
de Barros, SRRC ;
Saldanha, TCB ;
Araújo, MCU ;
Galvao, RKH .
JOURNAL OF THE BRAZILIAN CHEMICAL SOCIETY, 2005, 16 (01) :58-61
[9]   A strategy for selecting calibration samples for multivariate modelling [J].
Dantas, HA ;
Galvao, RKH ;
Araújo, MCU ;
da Silva, EC ;
Saldanha, TCB ;
José, GE ;
Pasquini, C ;
Raimundo, IM ;
Rohwedder, JJR .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2004, 72 (01) :83-91
[10]  
Galvao RKH, 2001, ANAL CHIM ACTA, V443, P107