A modified principal component technique based on the LASSO

被引:640
作者
Jolliffe, IT
Trendafilov, NT
Uddin, M
机构
[1] Univ Aberdeen, Kings Coll, Dept Math Sci, Aberdeen AB24 3UE, Scotland
[2] Univ W England, Fac Comp Engn & Math Sci, Bristol BS16 1QY, Avon, England
[3] Univ Karachi, Dept Stat, Karachi 75270, Pakistan
关键词
interpretation; principal component analysis; simplification;
D O I
10.1198/1061860032148
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 [统计学]; 070103 [概率论与数理统计]; 0714 [统计学];
摘要
In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difficult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefficients on all p variables. A common approach is to effectively ignore (treat as zero) any coefficients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading. There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivation of the linear functions, or replace the principal components by "principal variables." This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regression where similar problems arise in interpreting regression equations. This approach is the so-called LASSO, the "least absolute shrinkage and selection operator," in which a bound is introduced on the sum of the absolute values of the coefficients, and in which some coefficients consequently become zero. We explore some of the properties of the new technique, both theoretically and using simulation studies, and apply it to an example.
引用
收藏
页码:531 / 547
页数:17
相关论文
共 23 条
[1]
BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384
[2]
LOADINGS AND CORRELATIONS IN THE INTERPRETATION OF PRINCIPAL COMPONENTS [J].
CADIMA, J ;
JOLLIFFE, IT .
JOURNAL OF APPLIED STATISTICS, 1995, 22 (02) :203-214
[3]
Variable selection and the interpretation of principal subspaces [J].
Cadima, JFCL ;
Jolliffe, IT .
JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2001, 6 (01) :62-79
[4]
CHU MT, 2001, J COMPUTATIONAL GRAP, V10, P1
[5]
Penalized regressions: The bridge versus the lasso [J].
Fu, WJJ .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1998, 7 (03) :397-416
[6]
GLOBAL OPTIMIZATION OF STATISTICAL FUNCTIONS WITH SIMULATED ANNEALING [J].
GOFFE, WL ;
FERRIER, GD ;
ROGERS, J .
JOURNAL OF ECONOMETRICS, 1994, 60 (1-2) :65-99
[7]
Hausman R., 1982, OPTIMIZATION STAT, P137
[8]
Helmke U., 1994, Optimization and Dynamical Systems
[9]
Jeffers J.N.R., 1967, APPLIED STATISTICS, V16, P225, DOI DOI 10.2307/2985919
[10]
Jolliffe I.T., 2002, PRINCIPAL COMPONENTS