Shotgun Stochastic search for "Large p" regression

被引:151
作者
Hans, Chris [1 ]
Dobra, Adrian
West, Mike
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[2] Univ Washington, Ctr Stat & Social Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Stat & Biobehav Nursing, Seattle, WA 98195 USA
[4] Univ Washington, Dept Hlth Syst, Seattle, WA 98195 USA
[5] Duke Univ, ISDS, Durham, NC 27708 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
model averaging; parallel computing; regression model uncertainty; stochastic search; variable selection;
D O I
10.1198/016214507000000121
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 [统计学]; 070103 [概率论与数理统计]; 0714 [统计学];
摘要
Model search in regression with very large numbers of candidate predictors raises challenges for both model specification and computation, for which standard approaches such as Markov chain Monte Carlo (MCMC) methods are often infeasible or ineffective. We describe a novel shotgun stochastic search (SSS) approach that explores "interesting" regions of the resulting high-dimensional model spaces and quickly identifies regions of high posterior probability over models. We describe algorithmic and modeling aspects, priors over the model space that induce sparsity and parsimony over and above the traditional dimension penalization implicit in Bayesian and likelihood analyses, and parallel computation using cluster computers. We discuss an example from gene expression cancer genomics, comparisons with MCMC and other methods, and theoretical and simulation-based aspects of performance characteristics in large-scale regression model searches. We also provide software implementing the methods.
引用
收藏
页码:507 / 516
页数:10
相关论文
共 30 条
[1]
Multivariate Bayesian variable selection and prediction [J].
Brown, PJ ;
Vannucci, M ;
Fearn, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 :627-641
[2]
Brown PJ, 1998, J CHEMOMETR, V12, P173, DOI 10.1002/(SICI)1099-128X(199805/06)12:3<173::AID-CEM505>3.0.CO
[3]
2-0
[4]
Bayesian CART model search [J].
Chipman, HA ;
George, EI ;
McCulloch, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :935-948
[5]
Computing Bayes factors by combining simulation and asymptotic approximations [J].
DiCiccio, TJ ;
Kass, RE ;
Raftery, A ;
Wasserman, L .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (439) :903-915
[6]
Sparse graphical models for exploring gene expression data [J].
Dobra, A ;
Hans, C ;
Jones, B ;
Nevins, JR ;
Yao, GA ;
West, M .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) :196-212
[7]
Gene expression profiles of multiple breast cancer phenotypes and response to neoadjuvant chemotherapy [J].
Dressman, HK ;
Hans, C ;
Bild, A ;
Olson, JA ;
Rosen, E ;
Marcom, PK ;
Liotcheva, VB ;
Jones, EL ;
Vujaskovic, Z ;
Marks, J ;
Dewhirst, MW ;
West, M ;
Nevins, JR ;
Blackwell, K .
CLINICAL CANCER RESEARCH, 2006, 12 (03) :819-826
[8]
REGRESSIONS BY LEAPS AND BOUNDS [J].
FURNIVAL, GM ;
WILSON, RW .
TECHNOMETRICS, 1974, 16 (04) :499-511
[9]
Geiger D, 2002, ANN STAT, V30, P1412
[10]
SAMPLING-BASED APPROACHES TO CALCULATING MARGINAL DENSITIES [J].
GELFAND, AE ;
SMITH, AFM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (410) :398-409