Feature selection in principal component analysis of analytical data

被引:119
作者
Guo, Q
Wu, W
Massart, DL
Boucon, C
de Jong, S
机构
[1] Free Univ Brussels, Inst Pharmaceut, ChemoAC, B-1090 Brussels, Belgium
[2] SmithKline Beecham Pharmaceut, Safety Assessment, Welwyn Garden City AL6 9AR, Herts, England
[3] Unilever Res Labs Vlaardingen, NL-3133 AT Vlaardingen, Netherlands
关键词
feature selection; principal component analysis; genetic algorithm; generalised procrustes analysis; data mining; gas chromatography;
D O I
10.1016/S0169-7439(01)00203-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A feature selection method is proposed to select a subset of variables in principal component analysis (PCA) that preserves as much information present in the complete data as possible, The information is measured by means of the percentage of consensus in generalised Procrustes analysis. The best subset of variables is obtained by applying a genetic algorithm (GA) to optimise the consensus between the subset and the complete data set in order to avoid exhaustive searching. The method was evaluated on a standard data set known as the Alate data, and on a high-dimensional industrial gas chromatography (GC) data set. The results showed that the proposed method successfully identified structure-bearing variables in both data sets and that it leads to a better subset of variables than other studied feature selection methods. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 25 条
[1]   RANK-ONE MODIFICATION OF SYMMETRIC EIGENPROBLEM [J].
BUNCH, JR ;
NIELSEN, CP ;
SORENSEN, DC .
NUMERISCHE MATHEMATIK, 1978, 31 (01) :31-48
[2]   UPDATING SINGULAR VALUE DECOMPOSITION [J].
BUNCH, JR ;
NIELSEN, CP .
NUMERISCHE MATHEMATIK, 1978, 31 (02) :111-129
[3]   Combinatorial Chemistry -: What's in it for analytical chemists? [J].
Czarnik, AW .
ANALYTICAL CHEMISTRY, 1998, 70 (11) :378A-386A
[4]  
Dijksterhuis G., 1990, FOOD QUAL PREFER, V2, P255, DOI [10.1016/0950-3293(90)90017-O, DOI 10.1016/0950-3293(90)90017-O]
[5]  
Dijksterhuis G.B., 1991, Food Quality and Preference, V3, P67, DOI 10.1016/0950-3293(91)90027-C
[6]   The role of permutation tests in exploratory multivariate data analysis [J].
Dijksterhuis, GB ;
Heiser, WJ .
FOOD QUALITY AND PREFERENCE, 1995, 6 (04) :263-270
[7]  
Goldberg D. E., 1989, GENETIC ALGORITHMS S
[8]   GENERALIZED PROCRUSTES ANALYSIS [J].
GOWER, JC .
PSYCHOMETRIKA, 1975, 40 (01) :33-51
[9]   Sequential projection pursuit using genetic algorithms for data mining of analytical data [J].
Guo, Q ;
Wu, W ;
Questier, F ;
Massart, DL ;
Boucon, C ;
de Jong, S .
ANALYTICAL CHEMISTRY, 2000, 72 (13) :2846-2855
[10]   Feature selection in sequential projection pursuit [J].
Guo, Q ;
Wu, W ;
Massart, DL ;
Boucon, C ;
de Jong, S .
ANALYTICA CHIMICA ACTA, 2001, 446 (1-2) :85-96