ON ALMOST LINEARITY OF LOW-DIMENSIONAL PROJECTIONS FROM HIGH-DIMENSIONAL DATA

被引:299
作者
HALL, P
LI, KC
机构
[1] CSIRO,CANBERRA,ACT 2601,AUSTRALIA
[2] UNIV CALIF LOS ANGELES,DEPT MATH,LOS ANGELES,CA 90024
关键词
PROJECTIONS; PROJECTION PURSUIT; DATA VISUALIZATION; DIMENSION REDUCTION; SLICED INVERSE REGRESSION; REGRESSION ANALYSIS; LINK VIOLATION;
D O I
10.1214/aos/1176349155
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper studies the shapes of low dimensional projections from high dimensional data. After standardization, let x be a p-dimensional random variable with mean zero and identity covariance. For a projection beta'x, \\beta\\ = 1, find another direction b so that the regression curve of b'x against beta'x is as nonlinear as possible. We show that when the dimension of x is large, for most directions beta even the most nonlinear regression is still nearly linear. Our method depends on the construction of a pair of p-dimensional random variables, w1, w2, called the rotational twin, and its density function with respect to the standard normal density. With this, we are able to obtain closed form expressions for measuring deviation from normality and deviation from linearity in a suitable sense of average. As an interesting by-product, from a given set of data we can find simple unbiased estimates of E(f(beta'x)(t)/phi1(t)-1)2 and E[(\\E(x\beta, beta'x = t)\\2-t2)f(beta'x)2(t)/phi1(2)t)], where phi1 is the standard normal density, f(beta'x) is the density for beta'x and the ''E'' is taken with respect to the uniformly distributed beta. This is achieved without any smoothing and without resorting to any laborious projection procedures such as grand tours. Our result is related to the work of Diaconis and Freedman. The impact of our result on several fronts of data analysis is discussed. For example, it helps establish the validity of regression analysis when the link function of the regression model may be grossly wrong. A further generalization, which replaces beta'x by B'x with B = (beta1,...,beta(k)) for k randomly selected orthonormal vectors (beta(i), i = 1,...,k), helps broaden the scope of application of sliced inverse regression (SIR).
引用
收藏
页码:867 / 889
页数:23
相关论文
共 25 条
[1]  
Brillinger D. R., 1983, WADSWORTH STATIST PR, P97
[2]   IDENTIFICATION OF A PARTICULAR NONLINEAR TIME-SERIES SYSTEM [J].
BRILLINGER, DR .
BIOMETRIKA, 1977, 64 (03) :509-515
[3]   MEASUREMENT ERROR REGRESSION WITH UNKNOWN LINK - DIMENSION REDUCTION AND DATA VISUALIZATION [J].
CARROLL, RJ ;
LI, KC .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (420) :1040-1050
[4]  
CHEN H, 1991, ANN STAT, V19, P142
[5]  
CLEVELAND WC, 1988, DYNAMIC GRAPHICS STA
[6]  
CLEVELAND WS, 1988, COLLECTED WORKS JW T, V5
[7]  
COOK RD, 1991, J AM STAT ASSOC, V86, P328, DOI 10.2307/2290564
[8]   ASYMPTOTICS OF GRAPHICAL PROJECTION PURSUIT [J].
DIACONIS, P ;
FREEDMAN, D .
ANNALS OF STATISTICS, 1984, 12 (03) :793-815
[9]   PROJECTION-BASED APPROXIMATION AND A DUALITY WITH KERNEL METHODS [J].
DONOHO, DL ;
JOHNSTONE, IM .
ANNALS OF STATISTICS, 1989, 17 (01) :58-106
[10]   SLICING REGRESSION - A LINK-FREE REGRESSION METHOD [J].
DUAN, N ;
LI, KC .
ANNALS OF STATISTICS, 1991, 19 (02) :505-530