Variable selection and the interpretation of principal subspaces

被引:63
作者
Cadima, JFCL
Jolliffe, IT
机构
[1] Inst Super Agron, Dept Matemat, P-1399 Lisbon, Portugal
[2] Univ Aberdeen, Kings Coll, Dept Math Sci, Aberdeen AB24 3UE, Scotland
关键词
loadings; multiple regression; principal components; reification; stepwise selection;
D O I
10.1198/108571101300325256
中图分类号
Q [生物科学];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.
引用
收藏
页码:62 / 79
页数:18
相关论文
共 28 条
[1]
Aarts E., 1989, SIMULATED ANNEALING
[2]
[Anonymous], 1997, FUNCTIONAL DATA ANAL
[3]
Regionalization of precipitation in Switzerland by means of principal component analysis [J].
Baeriswyl, PA ;
Rebetez, M .
THEORETICAL AND APPLIED CLIMATOLOGY, 1997, 58 (1-2) :31-41
[4]
BONIFAS I, 1984, REV STAT APPL, V23, P5
[5]
LOADINGS AND CORRELATIONS IN THE INTERPRETATION OF PRINCIPAL COMPONENTS [J].
CADIMA, J ;
JOLLIFFE, IT .
JOURNAL OF APPLIED STATISTICS, 1995, 22 (02) :203-214
[6]
Identification of mitochondrial deficiency using principal component analysis [J].
Durrieu, G ;
Letellier, T ;
Antoch, J ;
Deshouillers, JM ;
Malgat, M ;
Mazat, JP .
MOLECULAR AND CELLULAR BIOCHEMISTRY, 1997, 174 (1-2) :149-156
[7]
Falguerolles A., 1993, CANADIAN J STAT, V21, P239
[8]
The use of principal component analysis (PCA) for pattern recognition in Eucalyptus grandis wood biodegradation experiments [J].
Ferraz, A ;
Esposito, E ;
Bruns, RE ;
Duran, N .
WORLD JOURNAL OF MICROBIOLOGY & BIOTECHNOLOGY, 1998, 14 (04) :487-490
[9]
Golub G. H., 2012, MATRIX COMPUTATIONS
[10]
GONZALEZ PL, 1990, COMPSTAT 1990 : PROCEEDINGS IN COMPUTATIONAL STATISTICS, P115