Comparing methods for multivariate nonparametric regression

被引:8
作者
Banks, DL
Olszewski, RT
Maxion, RA
机构
[1] FDA CBER, Rockville, MD 20852 USA
[2] Univ Pittsburgh, Ctr Biomed Informat, Pittsburgh, PA USA
[3] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
multivariate nonparametric regression; projection pursuit regression; MARS; ACE; LOESS; neural networks;
D O I
10.1081/SAC-120017506
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The ever-growing number of high-dimensional, superlarge databases requires effective analysis techniques to mine interesting information from the data. Development of new-wave methodologies for high-dimensional nonparametric regression has exploded over the last decade in an effort to meet these analysis demands. This article reports on an extensive simulation experiment that compares the performance of ten different, commonly-used regression techniques: linear regression, stepwise linear regression, additive models (AM), projection pursuit regression (PPR), recursive partitioning regression (RPR), multivariate adaptive regression splines (MARS), alternating conditional expectations (ACE), additivity and variance stabilization (AVAS), locally weighted regression (LOESS), and neural networks. Each regression technique was used to analyze multiple datasets each having a unique embedded structure; the accuracy of each technique was determined by its ability to correctly identify the embedded structure averaged over all the datasets. Datasets used in the experiment were constructed so as to have particular properties which varied across the datasets in order to determine each technique's accuracy within various environments. The dataset properties which were varied include dimension of the data, the true dimension of the embedded structure, the sample size, the amount of noise, and the complexity of the embedded structure. Analyses of the results show that all of these properties affect the accuracy of each regression technique under investigation. A mapping from data characteristics to the most effective regression technique(s) is suggested.
引用
收藏
页码:541 / 571
页数:31
相关论文
共 35 条
[1]  
[Anonymous], 1992, MULTIVARIATE DENSITY
[2]  
BANKS DL, 1999, CMUCS99102 DEP COMP
[3]  
BARRON AR, 1991, ANN STAT, V19, P67, DOI 10.1214/aos/1176347964
[4]   UNIVERSAL APPROXIMATION BOUNDS FOR SUPERPOSITIONS OF A SIGMOIDAL FUNCTION [J].
BARRON, AR .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (03) :930-945
[5]  
BARRON AR, 1991, NATO ADV SCI I C-MAT, V335, P561
[6]  
BARRON AR, 1998, COMPUTING SCI STAT, P192
[7]  
BREIMAN L, 1985, J AM STAT ASSOC, V80, P580, DOI 10.2307/2288473
[8]  
BREIMAN L, 1991, ANN STAT, V19, P82, DOI 10.1214/aos/1176347965
[9]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[10]  
BREIMAN L, 1991, TECHNOMETRICS, V33, P124