Combining estimates in regression and classification

被引:174
作者
LeBlanc, M
Tibshirani, R
机构
[1] UNIV TORONTO,DEPT PREVENT MED & BIOSTAT,TORONTO,ON M5S 1A8,CANADA
[2] UNIV TORONTO,DEPT STAT,TORONTO,ON M5S 1A8,CANADA
关键词
bootstrap; cross-validation; model combination;
D O I
10.2307/2291591
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of how to combine a collection of general regression fit vectors to obtain a better predictive model. The individual fits may be from subset linear regression, ridge regression, or something more complex like a neural network. We develop a general framework for this problem and examine a cross-validation-based proposal called ''model mix'' or ''stacking'' in this context. We also derive combination methods based on the bootstrap and analytic methods and compare them in examples. Finally, we apply these ideas to classification problems where the estimated combination weights can yield insight into the structure of the problem.
引用
收藏
页码:1641 / 1650
页数:10
相关论文
共 14 条
[1]  
BREIMAN L, 1995, MACH LEARN, V24, P49
[2]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.2307/2530946
[3]  
BREIMAN L, 1996, MACH LEARN, V26, P123
[4]  
CLARK LA, 1992, STAT MODELS
[6]  
Efron B, 1994, INTRO BOOTSTRAP, DOI DOI 10.1201/9780429246593
[7]   HEDONIC HOUSING PRICES AND DEMAND FOR CLEAN-AIR [J].
HARRISON, D ;
RUBINFELD, DL .
JOURNAL OF ENVIRONMENTAL ECONOMICS AND MANAGEMENT, 1978, 5 (01) :81-102
[8]  
HASTIE T, 1993, J ROY STAT SOC B MET, V55, P757
[9]  
HASTIE TJ, 1991, SHRINKING TREES
[10]  
Lawson C. L, 1974, Solving Least Squares Problems