Properties of principal component methods for functional and longitudinal data analysis

被引:321
作者
Hall, Peter [1 ]
Mueller, Hans-Georg
Wang, Jane-Ling
机构
[1] Australian Natl Univ, Ctr Math & Appl, Canberra, ACT 0200, Australia
[2] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
关键词
biomedical studies; curse of dimensionality; eigenfunction; eigenvalue; eigenvector; Karhunen-Loeve expansion; local polynomial methods; nonparametric; operator theory; optimal convergence rate; principal component analysis; rate of convergence; semiparametric; sparse data; spectral decomposition; smoothing;
D O I
10.1214/009053606000000272
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The use of principal component methods to analyze functional data is appropriate in a wide range of different settings. In studies of "functional data analysis," it has often been assumed that a sample of random functions is observed precisely, in the continuum and without noise. While this has been the traditional setting for functional data analysis, in the context of longitudinal data analysis a random function typically represents a patient, or subject, who is observed at only a small number of randomly distributed points, with nonnegligible measurement error. Nevertheless, essentially the same methods can be used in both these cases, as well as in the vast number of settings that lie between them. How is performance affected by the sampling plan? In this paper we answer that question. We show that if there is a sample of n functions, or subjects, then estimation of eigenvalues is a semiparametric problem, with root-n consistent estimators, even if only a few observations are made of each function, and if each observation is encumbered by noise. However, estimation of eigenfunctions becomes a nonparametric problem when observations are sparse. The optimal convergence rates in this case are those which pertain to more familiar function-estimation settings. We also describe the effects of sampling at regularly spaced points, as opposed to random points. In particular, it is shown that there are often advantages in sampling randomly. However, even in the case of noisy data there is a threshold sampling rate (depending on the number of functions treated) above which the rate of sampling (either randomly or regularly) has negligible impact on estimator performance, no matter whether eigenfunctions or eigenvectors are being estimated.
引用
收藏
页码:1493 / 1517
页数:25
相关论文
共 38 条
[1]   PRINCIPAL COMPONENTS-ANALYSIS OF SAMPLED FUNCTIONS [J].
BESSE, P ;
RAMSAY, JO .
PSYCHOMETRIKA, 1986, 51 (02) :285-311
[2]   Kernel-based functional principal components [J].
Boente, G ;
Fraiman, R .
STATISTICS & PROBABILITY LETTERS, 2000, 48 (04) :335-345
[3]  
BOSQ D, 1991, NATO ADV SCI I C-MAT, V335, P509
[4]  
Bosq D., 2012, Linear Processes in Function Space: Theory and Applications, V149
[5]   Smoothing spline models for the analysis of nested and crossed samples of curves [J].
Brumback, BA ;
Rice, JA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :961-976
[6]  
Capra B, 1997, J AM STAT ASSOC, V92, P72
[7]   Asymptotic study of a hybrid spline estimator for the functional linear model [J].
Cardot, H ;
Ferraty, F ;
Sarda, P .
COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 2000, 330 (06) :501-504
[8]  
Cardot H, 2003, STAT SINICA, V13, P571
[9]   Nonparametric estimation of smoothed principal components analysis of sampled noisy functions [J].
Cardot, H .
JOURNAL OF NONPARAMETRIC STATISTICS, 2000, 12 (04) :503-538
[10]   PRINCIPAL MODES OF VARIATION FOR PROCESSES WITH CONTINUOUS SAMPLE CURVES [J].
CASTRO, PE ;
LAWTON, WH ;
SYLVESTRE, EA .
TECHNOMETRICS, 1986, 28 (04) :329-337